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5 NOVEL HCV NON STRUCTURAL POLYPEPTIDE 

FIELD OF THE INVENTION 

The present invention relates to polypeptides comprising a mutant non- 
1 0 structural Hepatitis C virus ("HCV") polypeptide useful for immunogenic compounds 
for use against HCV, methods of preparing and using the same, and immunogenic 
compositions comprising the same. The present invention also relates to compositions 
comprising (a) a mutant non-structural HCV polypeptide and (b) a viral polypeptide 
that is not a non-structural HCV polypeptide and methods of using these compositions. 

15 

BACKGROUND OF THE INVENTION 

HCV is now recognized as the major agent of chronic hepatitis and liver disease 
worldwide. It is estimated that HCV infects about 400 million people worldwide, 
corresponding to more than 3% of the world population. 
20 Hepatitis C virus ("HCV") is a small enveloped RNA flavivirus, which contains 

a positive-stranded RNA genome of about 10 kilobases. The genome has a single 
uninterrupted ORF that encodes a protein of 3010-301 1 amino acids. The structural 
proteins of HCV include a core protein (C), which is highly immunogenic, as well as 
two envelope proteins (El and E2), which likely form a heterodimer in vivo, and non- 
25 structural proteins NS2-NS5. It is known that the NS3 region of the virus is important 
for post-translational processing of the polyprotein into individual proteins, and the 
NS5 region encodes an RNA-dependant RNA polymerase. 

Virus-specific T lymphocytes, along with neutralizing antibodies, are the 
mainstay of the antiviral immune defense in established viral infections. Whereas 
30 CD8 + cytotoxic T cells eliminate virus-infected-cells, CD4 + T helper cells are essentiflf 
for the efficient regulation of the antiviral immune response. CD4 + T helper cells 
recognize specific antigens as peptides bound to autologous HLA class II molecules 
(viral antigens or particles are taken up by professional antigen-presenting cells, 
processed to peptides, bound to HLA class II molecules in the lysosomal compartment, 
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and transported back to the cell surface). Several observations support an important 
role of CD4 + T cells in the elimination of HCV infection. Tsai et al, 1997 Hepatology 
25:449-458; Diepolder et al 1995 Lancet 346: 1—6-1009; Missale et al 1996 JCI 98: 
706-714; Botarelli et al 1993; Gastro 104: 580-587; Diepolder et al 1997 J.Virol 71: 
5 601 1 . Immunogenic peptides usually have a minimal length of 8-1 1 amino acids. 

However, since the peptide binding groove of HLA class II molecules seems to be open 
at both ends, longer peptides are tolerated. Thus peptides eluted from HLA class II 
molecules are typically in the range of 15-25 amino acids. HLA class II molecules are 
extremely polymorphic and each allele seems to have its individual requirements for 

10 peptide binding. Thus the HLA class II repertoire of a given individual determines 
which viral peptides can be presented to T cells. Recognition of the specific HLA- 
peptide complex by the T cell receptor accompanied by appropriate costimulatory 
signals lead to T cell activation, secretion of cytokines, and T cell proliferation. 

Numerous studies demonstrate that HLA Class II restricted CD4 + responses are 

1 5 determined by stimulating peripheral blood mononuclear cells with recombinant viral 
antigens or peptides. Botarelli et al, (1993) Gastroenterology 104:580-587; Farrari et 
al, (1994) Hepatology 19:286-295; Minutello et al, (1993) C. J. Exp. Med. 178:17-25; 
Hoffmann et al, (1995) Hepatology 21:632-638; Iwata et al, (1995) Hepatology 
22:1057-1064; and Tsai.e/a/., (1995) Hepatology 21:908-912. 

20 Polyclonal multispecific CD8 + T cell responses have been detected in patients 

with chronic hepatitis C. Additionally, CD8 + CTL's were shown to be important in 
resolving acute HCV infection in chimpanzees (Cooper et al, Immunity 1999). About 
50% of patients with chronic hepatitis C demonstrate a detectable virus-specific CD4 + 
T cell response, which is most frequently directed against HCV core and/or NS4 and 

25 tends to be more common in patients who achieve sustained viral clearance during 
interferon-a therapy. 

Depending on the pattern of lymphokines, CD4 + T helper cells have been 
classified as TH1, TH0, or TH2. Cytokines of the TH1 type are typically IFN-y, 
lymphotoxin, and interleukin-2 (IL-2), which are believed to support activation of 

30 virus-specific CD8 + T cells and natural killer cells. The TH2 cytokines IL-4, IL-5, AL- 
IO, and IL-13 are important for B cell activation and differentiation, thus inducing a 
humoral immune response. 
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During acute hepatitis C infection a strong and sustained TH1/TH0 response to 
NS3 and possibly to other nonstructural proteins is associated with a self-limited course 
of the disease. Diapolder et ai y (1995) Lancet 346:1006-1007, showed all CD4* T cell 
clones to have a TH1 or TH0 cytokine profile, suggesting that the clones support 
5 cytotoxic immune mechanisms in vivo. The majority of CD4* T cell clones responded 
to a relatively short segment of NS3, namely amino acids 1207-1278, suggesting that 
this region of NS3 is immunodominant for CD4 + T cells. More than 70% of those 
who contract HCV develop chronic infection and hepatitis, and a significant portion of 
them progress to cirrhosis and eventually hepatocellular carcinoma. The only approved 
10 therapy at present is a 6- to 12- month course of interferon a, which leads to sustained 
improvement in only 20% of patients. So far, no commercial vaccine is available. 

Thus, there remains a need for compositions and methods capable of promoting 
anti-HCV responses. 

1 5 SUMMARY OF THE INVENTION 

In one aspect, the present invention relates to isolated polypeptides comprising 
mutant hepatitis C ("HCV") polypeptides comprising at least portions of NS3, NS4, 
and NS5. In a preferred aspect, NS3 is encoded by a nucleic acid sequence having an 
N-terminal deletion to remove the catalytic domain. The NS mutant polypeptides can 

20 include NS3, NS4s, NS4b, NS5a, NS5b or portions thereof. For example, in various 
embodiments, the mutant NS polypeptide comprises NS3, NS4 (NS4a and NS4b) and 
NS5 (NS5a and NS5b). In other embodiments, the NS polypeptide consists of NS3 and 
NS4 (for example, NS4a and/or NS4b) or NS3 and NS5 (for example, NS5a and/or 
NS5b). Other combinations of full-length or fragments of non-structural components 

25 are also contemplated. 

In another preferred aspect, the polypeptides further comprise a viral 
polypeptide that is not a non-structural HCV polypeptide. Such polypeptides are 
preferably C, or antigenic fragments thereof, more preferably, truncated C of HCV. 
Other polypeptides are preferably E, or antigenic fragments thereof, more preferably, 

30 El or E2 of HCV. Such polypeptides need not be encoded by a natural HCV genome, 
and include, for example, truncated or otherwise mutant HCV polypeptides or 
polypeptides derived from other genomes, such as, for example, polypeptides of HBV. 
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Thus, the invention includes an isolated mutant non-structural ("NS") HCV polypeptide 
comprising a polypeptide having a mutation in the catalytic domain of NS3 that 
functionally disrupts the catalytic domain. The mutation can be, for example, a 
deletion or a substitution mutation. In certain embodiments, the mutant NS polypeptide 
5 comprises NS3, NS4 and NS5. In other embodiments, the mutant NS polypeptides 
described herein further comprise a second viral polypeptide that is not NS3, NS4, or 
NS5 of HCV, for example an HCV Core polypeptide ("C"), or fragment thereof, or an 
HCV envelope protein ("E"), for example El and/or E2. In certain embodiments, C is 
truncated (e.g., at amino acid 121). 

1 0 In another aspect, the present invention relates to compositions comprising any 

of the mutant hepatitis C ("HCV") polypeptides described herein, for example 
polypeptides comprising at least portions of NS3, NS4, and NS5. In a preferred aspect, 
NS3 is encoded by a nucleic acid sequence having an N-terminal deletion to disrupt the 
function of the catalytic domain, for example by removing this domain. In another 

1 5 preferred aspect, the polypeptides further comprise a viral polypeptide that is not a non- 
structural HCV polypeptide. Such polypeptides are preferably C, or antigenic 
fragments thereof, more preferably, truncated C of HCV. Other polypeptides are 
preferably E, or antigenic fragments thereof, more preferably, El or E2 of HCV Such 
polypeptides need not be encoded by a natural HCV genome, and include, for example, 

20 truncated or otherwise mutant HCV polypeptides or polypeptides derived from other 
genomes, such as, for example, polypeptides of HBV. In another aspect, the invention 
includes a composition comprising (a) any of the polypeptides described herein; and (b) 
a pharmaceutically acceptable excipient (e.g., carrier and/or adjuvant). 

In another aspect, the invention includes an isolated and purified polynucleotide 

25 which encodes any of the mutant HCV polypeptides described herein. In certain 

embodiments, the invention includes a composition comprising (a) the isolated purified 
polynucleotide encoding any of the mutant HCV polypeptides; and (b) a 
pharmaceutically acceptable excipient. The polynucleotide, can be for example, DNA 
in a plasmid, or is in a plasmid. Additionally, the polynucleotides described herein may 

30 be included in an expression vector as shown in the attached Figures and Sequence 
Listings. 
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In another aspect, the present invention relates to host cells transformed with 
expression vectors comprising a nucleic acid sequence encoding a mutant HCV 
polypeptide comprising at least portions of NS3, NS4, and NS5. In a preferred aspect, 
the expression vectors of the host cells further comprises at least one nucleic acid 
5 sequence encoding a viral polypeptide that is not a non-structural HCV polypeptide. 
Such polypeptides are preferably C, or antigenic fragments thereof, more preferably, 
truncated C of HCV. Other polypeptides are preferably E, or antigenic fragments 
thereof, more preferably, El or E2 of HCV. Such polypeptides need not be encoded 
by a natural HCV genome, and include, for example, truncated or otherwise mutant 

10 HCV polypeptides or polypeptides derived from other genomes, such as, for example, 
polypeptides of HBV. In another preferred aspect the nucleic acid sequences of the 
expression vectors are coexpressed. In yet another preferred aspect, the host cells are 
yeast cells or mammalian cells. 

In another aspect, the present invention relates to expression vectors comprising 

1 5 a nucleic acid sequence encoding a mutant HCV polypeptide comprising NS3, NS4, 
and NS5. In a preferred aspect, the expression vectors of the host cells further 
comprises at least one nucleic acid sequence encoding a viral polypeptide that is not a 
non-structural HCV polypeptide. Such polypeptides are preferably C, or antigenic 
fragments thereof, more preferably, truncated C of HCV. Other polypeptides are 

20 preferably E, or antigenic fragments thereof, more preferably, El or E2 of HCV. 

Importantly, such polypeptides need not be encoded by a natural HCV genome, such 
as, for example, truncated or otherwise mutant HCV polypeptides or polypeptides 
derived from other genomes, such as, for example, polypeptides of HBV. In another 
aspect, the present invention relates to methods of preparing a mutant HCV 

25 polypeptides. In a preferred aspect, the method comprises the steps of transforming a 
host cell with an expression vector, said vector comprising a nucleic acid sequence 
encoding a mutant HCV polypeptide comprising at least portions of NS3, NS4, and 
NS5, and isolating said polypeptide. In another preferred aspect the HCV polypeptide 
further comprises a viral polypeptide that is not a non-structural HCV polypeptide. 

30 Such polypeptides are preferably C, or antigenic fragments thereof, more preferably, 
truncated C of HCV. Other polypeptides are preferably E, or antigenic fragments 
thereof, more preferably, El or E2 of HCV. Such polypeptides need not be encoded by 
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a natural HCV genome, and include, for example, truncated or otherwise mutant HCV 
polypeptides or polypeptides derived from other genomes, such as, for example, 
polypeptides of HBV. In another preferred aspect the host cells are yeast cells or 
mammalian cells. 

5 In another aspect, the present invention relates to antibodies which specifically 

bind to mutant HCV polypeptide comprising NS3, NS4, and NS5, and to methods of 
making and using the same. In a preferred aspect, the HCV polypeptide further 
comprises a viral polypeptide that is not a non-structural HCV polypeptide. Such 
polypeptides are preferably C, or antigenic fragments thereof, more preferably, 

10 truncated C of HCV. Other polypeptides are preferably E, or antigenic fragments 

thereof, more preferably, El or E2 of HCV. Such polypeptides need not be encoded 
by a natural HCV genome, such as, for example, truncated or otherwise mutant HCV 
polypeptides or polypeptides derived from other genomes, and include, for example, 
polypeptides of HBV. In another preferred aspect, the antibody is either monoclonal or 

15 polyclonal. 

In yet another aspect, a method of preparing a mutant NS HCV polypeptide, 
wherein the method comprises the steps of (a) transforming a host cell with any of the 
expression vectors described herein, under conditions wherein the polypeptide is 
expressed; and (b) isolating the polypeptide. The host cell can be, for example, a yeast 
20 cell, a mammalian cell a plant cell or an insect cell. The polypeptide can be expressed 
and isolated intracellular^ or can be secreted and isolated from the surrounding 
environment. 

In a still further aspect, a method of eliciting an immune response in a subject is 
provided. The immune response can be elicited by administering any of the 
25 polynucleotides and/or polypeptides described herein in one or multiple doses. 

These and other embodiments of the subject invention will readily occur to 
those of skill in the art in light of the disclosure herein. 

BRIEF DESCRIPTION OF THE FIGURES 

30 FIG. 1 shows the cloning scheme for generating pCMV-NS35. 
FIG. 2 shows the 9621bp vector pCMV-NS35. 
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FIG. 3 shows the nucleic acid sequence of pCMV-NS35 (SEQ ID NO:l), including the 
nucleic acid sequence of the NS35 ORF, and also the translation of NS35 (SEQ ID 
NO:2). 

FIG. 4 shows the 9621bp pCMV-delNS35. 
5 FIG. 5 shows the nucleic acid sequence of pCMV-delNS35 (SEQ ID NO:3), including 
the nucleic acid sequence of the delNS35 ORF, and also the translation of the delNS35 
polypeptide (SEQ ID NO:4). 
FIG. 6 shows the 4276bp pCMV-II. 

FIG. 7 shows the nucleic acid sequence of pCMV-II (SEQ ID NO:5). 
1 0 FIG. 8 shows the 6300bp pCMV-NS34A. 

FIG. 9 shows the nucleic acid sequence of pCMV-NS34A (SEQ ID NO:6), including 
the nucleic acid sequence of the NS34A ORF, and also the translation of NS34A (SEQ 
ID NO:7). 

FIG. 10 shows the cloning scheme for generating pd.ANS3NS5. 
15 FIG. 1 1 shows the nucleic and amino acid sequences of pd.ANS3NS5 (SEQ ID NO:8 
and 9). 

FIG. 12 shows the Western blot of proteins expressed by S. cerevisiae strain AD3 
transformed with pd.ANS3NS5. 

FIG. 13 shows the cloning scheme for generating pd.ANS3NS5.pj. 
20 FIG. 14 shows the nucleic and amino acid sequences of pd.ANS3NS5.pj (SEQ ID 
NO:10andll). 

FIG. 15 shows the Western blot of proteins expressed by S. cerevisiae strain AD3 
transformed with pd.ANS3NS5.pj, specifically demonstrating the expression of 
ANS3NS5 polypeptide. 
25 FIG. 16 shows the cloning scheme for generating pdANS3NS5.pj.corel21RT and 
pdANS3NS5.pj.corel73RT. 

FIG. 17 shows the nucleic and amino acid sequences of pd.ANS3NS5.pj.corel21 (SEQ 
IDNO:12andl3). 

FIG. 18 shows the nucleic and amino acid sequences of pd.ANS3NS5.pj.corel73 (SEQ 
30 IDNO:14andl5). 

FIG. 19 shows the Western blot of proteins expressed by S. cerevisiae strain AD3 
transformed with pd.ANS3NS5.pj, specifically demonstrating the expression of 
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ANS3NS5.corel21 and ANS3NS5.corel73 polypeptides. Lanes 1 and 7 show See 
Blue Standards. Lane 2 shows control yeast plasmid. Lanes 3 and 4 show 
ANS3NS5.corel21RT polypeptide, colonies 1 and 2. Lanes 5 and 6 show 
ANS3NS5xorel73RT polypeptide, colonies 3 and 4. 
5 FIG. 20 shows the cloning scheme for generating pdANS3NS5.pj.corel40RT and 
pdANS3NS5.pj.corel50RT. 

FIG. 21 shows the nucleic and amino acid sequences of pd.ANS3NS5.pj.corel40 (SEQ 
ID NO: 16 and 17). 

FIG. 22 shows the nucleic and amino acid sequences of pd.ANS3NS5.pj.corel50 (SEQ 

10 IDNO:18andl9). 

FIG. 23 shows the Western blot of proteins expressed by S. cerevisiae strain AD3 
transformed with pd.ANS3NS5.pj, specifically demonstrating the expression of 
ANS3NS5corel40 and ANS3NS5corel50 polypeptides. Lane 1 shows See Blue 
Standards. Lanes 2 and 3 show ANS3NS55corel40RT polypeptide, colonies 5 and 6. 

15 Lanes 4 and 5 show ANS3NS5corel50RT polypeptide, colonies 7 and 8. Lane 6 shows 
control yeast plasmid. Lane 7 shows ANS3NS5corel21RT polypeptide, colony 1. 
Lane 8 shows ANS3NS5corel73RT polypeptide, colony 5. 

DETAILED DESCRIPTION OF THE INVENTION 

20 The practice of the present invention will employ, unless otherwise indicated, 

conventional techniques of molecular biology, microbiology, recombinant DNA 
techniques, and immunology, which are within the skill of the art. Such techniques are 
explained fully in the literature. See e.g., Sambrook, et al., MOLECULAR CLONING; 
A LABORATORY MANUAL (1989); DNA CLONING, VOLUMES I AND II (D. N. 

25 Glover ed. 1985); OLIGONUCLEOTIDE SYNTHESIS (M. J. Gait ed., 1984); 
NUCLEIC ACID HYBRIDIZATION (B, D. Hames & S. J. Higgins eds. 1984); 
TRANSCRIPTION AND TRANSLATION (B. D. Hames & S. J. Higgins eds. 1984); 
ANIMAL CELL CULTURE (R. I. Freshney ed. 1986); IMMOBILIZED CELLS AND 
ENZYMES (IRL Press, 1986); B. Perbal, A PRACTICAL GUIDE TO MOLECULAR 

30 CLONING (1984); the series, METHODS OF ENZYMOLOGY (Academic Press, 

Inc.); GENE TRANSFER VECTORS FOR MAMMALIAN CELLS (J. H. Miller and 
M. P. Calos eds. 1987, Cold Springs Harbor Laboratory), Methods in Enzymology Vol. 
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154 and Vol. 155 (Wu and Grossman, and Wu, eds., respectively); Mayer and Walker 
eds. (1987), IMMUNOHISTOCHEMICAL METHODS IN CELL AND 
MOLECULAR BIOLOGY (Academic Press, London); Scopes, (1987), PROTEIN 
PURIFICATION: PRINCIPALS AND PRACTICE, Second Edition (Springer- Verlag, 
5 New York); and HANDBOOK OF EXPERIMENTAL IMMUNOLOGY, VOLUMES 
I-IV (D. M. Weir and C. C Blackwell eds. 1986). 

It must be noted that, as used in this specification and the appended claims, the 
singular forms "a", "an" and "the" include plural referents unless the content clearly 
dictates otherwise. Thus, for example, reference to "an antigen" includes a mixture of 
10 two or more antigens, and the like. 

I. Definitions 

In describing the present invention, the following terms will be employed, and 
are intended to be defined as indicated below. 

1 5 The term "hepatitis C virus" (HCV) refers to an agent causative of Non-A, Non- 

B Hepatitis (NANBH). The nucleic acid sequence and putative amino acid sequence of 
HCV is described in U.S. Patent Nos. 5,856,437 and 5,350,671 . The disease caused by 
HCV is called hepatitis C, formerly called NANBH. The term HCV, as used herein, 
denotes a viral species of which pathenogenic strains cause NANBH, as well as 

20 attenuated strains or defective interfering particles derived therefrom. 

HCV is a member of the viral family flaviviridae. The morphology and 
composition of Flavivirus particles are known, and are discussed in Reed et al., Curr. 
Stud.Hematol. Blood Transfus. (1998), 62:1-37; HEPATITIS C VIRUSES IN FIELDS 
VIROLOGY (B.N. Fields, D.M. Knipe, P.M. Howley, eds.) (3d ed. 1996). It has 

25 recently been found that portions of the HCV genome are also homologous to 

pestiviruses. Generally, with respect to morphology, Flaviviruses contain a central 
nucleocapsid surrounded by a lipid bilayer. Virions are spherical and have a diameter 
of about 40-50 ran. Their cores are about 25-30 ran in diameter. Along the outer 
surface of the virion envelope are projections that are about 5-10 nm long with terminal 

30 knobs about 2 nm in diameter. 

The HCV genome is comprised of RNA. It is known that RNA containing 
viruses have relatively high rates of spontaneous mutation. Therefore, there can be 
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multiple strains, which can be virulent or avirulent, within the HCV class or species. 
The ORF of HCV, including the translation spans of the core, non-structural, and 
envelope proteins, is shown in U.S. Patent Nos. 5,856,437 and 5,350,671 . 

The terms "polypeptide" and "protein" refer to a polymer of amino acid 
5 residues and are not limited to a minimum length of the product. Thus, peptides, 

oligopeptides, dimers, multimers, and the like, are included within the definition. Both 
full-length proteins and fragments thereof are encompassed by the definition. The 
terms also include postexpression modifications of the polypeptide, for example, 
glycosylation, acetylation, phosphorylation and the like. Furthermore, for purposes of 

10 the present invention, a "polypeptide" refers to a protein which includes modifications, 
such as deletions, additions and substitutions (generally conservative in nature), to the 
native sequence, so long as the protein maintains the desired activity. These 
modifications may be deliberate, as through site-directed mutagenesis, or may be 
accidental, such as through mutations of hosts which produce the proteins or errors due 

1 5 to PCR amplification. 

An HCV polypeptide is a polypeptide, as defined above, derived from the HCV 
polyprotein. The polypeptide need not be physically derived from HCV, but may be 
synthetically or recombinantly produced. Moreover, the polypeptide may be derived 
from any of the various HCV strains, such as from strains 1, 2, 3 or 4 of HCV. A 

20 number of conserved and variable regions are known between these strains and, in 
general, the amino acid sequences of epitopes derived from these regions will have a 
high degree of sequence homology, e.g., amino acid sequence homology of more than 
30%, preferably more than 40%, when the two sequences are aligned and homology 
determined by any of the programs or algorithms described herein. Thus, for example, 

25 the term "NS4" polypeptide refers to native NS4 from any of the various HCV strains, 
as well as NS4 analogs, muteins and immunogenic fragments, as defined further below. 

Further, the terms "ANS35," "delNS35," "ANS3NS5," and "ANS3-5" as used 
herein refer to a mutant polypeptide, comprising at least portions of NS3, NS4, or NS5, 
comprising a deletion in, or mutation of, the NS3 protease active site region to render 

30 the protease non-functional. In one embodiment, ANS3-5 comprises amino acids 1242- 
301 1, as shown in FIG. 5, or polypeptides substantially homologous thereto. It will be 
readily apparent to one of ordinary skill in the art how to determine that NS3 protease 
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has been rendered non-functional. If the protease is functional, one will obtain protein 
of the expected molecular weight upon expression. As set forth in Example 2 and 
Figure 15, using SDS-page, 4-20%, a protein having a molecular weight of 
approximately 194kD was obtained when strain AD3 was transformed with 
5 pd.ANS3NS5.PJ clone #5. One skilled in the art could readily determine whether a 
protein of the desired molecular weight was expressed for any given deletion or 
mutation. 

The terms "analog" and "mutein" refer to biologically active derivatives of the 
reference molecule, or fragments of such derivatives, that retain desired activity, such 

10 as the ability to stimulate a cell-mediated immune response, as defined below. In 

general, the term "analog" refers to compounds having a native polypeptide sequence 
and structure with one or more amino acid additions, substitutions (generally 
conservative in nature) and/or deletions, relative to the native molecule, so long as the 
modifications do not destroy immunogenic activity. The term "mutein" refers to 

15 peptides having one or more peptide mimics ("peptoids"), such as those described in 
International Publication No. WO 91/04282. Preferably, the analog or mutein has at 
least the same immunoactivity as the native molecule. Methods for making 
polypeptide analogs and muteins are known in the art and are described further below. 
Particularly preferred analogs include substitutions that are conservative in 

20 nature, i.e., those substitutions that take place within a family of amino acids that are 
related in their side chains. Specifically, amino acids are generally divided into four 
families: (1) acidic aspartate and glutamate; (2) basic - lysine, arginine, histidine; 
(3) non-polar alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, 
tryptophan; and (4) uncharged polar — glycine, asparagine, glutamine, cysteine, serine 

25 threonine, tyrosine. Phenylalanine, tryptophan, and tyrosine are sometimes classified 
as aromatic amino acids. For example, it is reasonably predictable that an isolated 
replacement of leucine with isoleucine or valine, an aspartate with a glutamate, a 
threonine with a serine, or a similar conservative replacement of an amino acid with a 
structurally related amino acid, will not have a major effect on the biological activity. 

30 For example, the polypeptide of interest may include up to about 5-10 conservative or 
non-conservative amino acid substitutions, or even up to about 15-25 conservative or 
non-conservative amino acid substitutions, or any integer between 5-25, so long as the 
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desired function of the molecule remains intact. One of skill in the art may readily 
determine regions of the molecule of interest that can tolerate change by reference to 
HoppAVoods and Kyte-Doolittle plots, well known in the art. 

By "fragment" is intended a polypeptide consisting of only a part of the intact 
5 full-length polypeptide sequence and structure. The fragment can include a C-terminal 
deletion and/or an N-terminal deletion of the native polypeptide. An "immunogenic 
fragment" of a particular HCV protein will generally include at least about 5-10 
contiguous amino acid residues of the full-length molecule, preferably at least about 
15-25 contiguous amino acid residues of the full-length molecule, and most preferably 

1 0 at least about 20-50 or more contiguous amino acid residues of the full-length 

molecule, that define an epitope, or any integer between 5 amino acids and the full- 
length sequence, provided that the fragment in question retains immunogenic activity, 
as measured by the assays described herein. For a description of various HCV 
epitopes, see, e.g., Chien et al., Proc. Natl Acad. Sci. USA (1992) 89:1001 1-10015; . 

15 Chien et al., J. Gastroent Hepatol (1993) 8:S33-39; Chien et al., International 
Publication No. WO 93/00365; Chien, D.Y., International Publication No. WO 
94/01778; commonly owned, allowed U.S. Patent Application Serial Nos. 08/403,590 
and 08/444,818. 

The term "epitope" as used herein refers to a sequence of at least about 3 to 5, 
20 preferably about 5 to 10 or 15, and not more than about 1,000 amino acids (or any 
integer therebetween), which define a sequence that by itself or as part of a larger 
sequence, binds to an antibody generated in response to such sequence. There is no 
critical upper limit to the length of the fragment, which may comprise nearly the full- 
length of the protein sequence, or even a fusion protein comprising two or more 
25 epitopes from the HCV polyprotein. An epitope for use in the subject invention is not 
limited to a polypeptide having the exact sequence of the portion of the parent protein 
from which it is derived. Indeed, viral genomes are in a state of constant flux and 
contain several variable domains which exhibit relatively high degrees of variability 
between isolates. Thus the term "epitope" encompasses sequences identical to the 
30 native sequence, as well as modifications to the native sequence, such as deletions, 
additions and substitutions (generally conservative in nature). 
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Regions of a given polypeptide that include an epitope can be identified using 
any number of epitope mapping techniques, well known in the art. See, e.g., Epitope 
Mapping Protocols in Methods in Molecular Biology, Vol. 66 (Glenn E. Morris, Ed., 
1996) Humana Press, Totowa, New Jersey. For example, linear epitopes may be 
5 determined by e.g., concurrently synthesizing large numbers of peptides on solid 

supports, the peptides corresponding to portions of the protein molecule, and reacting 
the peptides with antibodies while the peptides are still attached to the supports. Such 
techniques are known in the art and described in, e.g., U.S. Patent No. 4,708,871; 
Geysenetal. (1984) Proc. Natl Acad. Sci. USA 81:3998-4002; Geysen et al. (1986) 

10 Molec. Immunol 23:709-715. Similarly, conformational epitopes are readily 

identified by determining spatial conformation of amino acids such as by, e.g., x-ray 
crystallography and 2-dimensional nuclear magnetic resonance. See, e.g., Epitope 
Mapping Protocols, supra. Antigenic regions of proteins can also be identified using 
standard antigenicity and hydropathy plots, such as those calculated using, e.g., the 

1 5 Omiga version 1 .0 software program available from the Oxford Molecular Group. This 
computer program employs the Hopp/Woods method, Hopp et al., Proc. Natl Acad. 
Sci USA (1981) 78:3824-3828 for determining antigenicity profiles, and the Kyte- 
Doolittle technique, Kyte et al., J. Mol Biol. (1982) 157:105-132 for hydropathy plots. 
As used herein, the term "conformational epitope" refers to a portion of a full- 

20 length protein, or an analog or mutein thereof, having structural features native to the 

amino acid sequence encoding the epitope within the full-length natural protein. Native 
structural features include, but are not limited to, glycosylation and three dimensional 
^ structure. Preferably, a conformational epitope is produced recombinantly and is 
expressed in a cell from which it is extractable under conditions which preserve its 

25 desired structural features, e.g. without denaturation of the epitope. Such cells include 
bacteria, yeast, insect, and mammalian cells. Expression and isolation of recombinant 
conformational epitopes from the HCV polyprotein are described in e.g., International 
Publication Nos. WO 96/04301, WO 94/01778, WO 95/33053, WO 92/08734. 

An "immunological response" to an HCV antigen (including both polypeptide 

30 and polynucleotides encoding polypeptides that are expressed in vivo) or composition is 
the development in a subject of a humoral and/or a cellular immune response to 
molecules present in the composition of interest. For purposes of the present invention, 
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a "humoral immune response" refers to an immune response mediated by antibody 
molecules, while a "cellular immune response" is one mediated by T-lymphocytes 
and/or other white blood cells. One important aspect of cellular immunity involves an 
antigen-specific response by cytolytic T-cells ("CTLs")- CTLs have specificity for 
5 peptide antigens that are presented in association with proteins encoded by the major 
histocompatibility complex (MHC) and expressed on the surfaces of cells. CTLs help 
induce and promote the intracellular destruction of intracellular microbes, or the lysis 
of cells infected with such microbes. Another aspect of cellular immunity involves an 
antigen-specific response by helper T-cells. Helper T-cells act to help stimulate the 

10 function, and focus the activity of, nonspecific effector cells against cells displaying 
peptide antigens in association with MHC molecules on their surface. A "cellular 
immune response" also refers to the production of cytokines, chemokines and other 
such molecules produced by activated T-cells and/or other white blood cells, including 
those derived from CD4+ and CD8+ T-cells. 

15 A composition or vaccine that elicits a cellular immune response may serve to 

sensitize a vertebrate subject by the presentation of antigen in association with MHC 
molecules at the cell surface. The cell-mediated immune response is directed at, or 
near, cells presenting antigen at their surface. In addition, antigen-specific T- 
lymphocytes can be generated to allow for the future protection of an immunized host. 

20 The ability of a particular antigen to stimulate a cell-mediated immunological 

response may be determined by a number of assays, such as by lymphoproliferation 
(lymphocyte activation) assays, CTL cytotoxic cell assays, or by assaying for T- 
lymphocytes specific for the antigen in a sensitized subject. Such assays are well 
known in the art. See, e.g., Erickson et al., 7. Immunol (1993) 151:4189-4199; Doe et 

25 al., Eur. J. Immunol (1994) 24:2369-2376; and the examples below. 

Thus, an immunological response as used herein may be one which stimulates 
the production of CTLs, and/or the production or activation of helper T- cells. The 
antigen of interest may also elicit an antibody-mediated immune response. Hence, an 
immunological response may include one or more of the following effects: the 

30 production of antibodies by B-cells; and/or the activation of suppressor T-cells and/or 
Y& T-cells directed specifically to an antigen or antigens present in the composition or 
vaccine of interest. These responses may serve to neutralize infectivity, and/or mediate 
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antibody-complement, or antibody dependent cell cytotoxicity (ADCC) to provide 
protection or alleviation of symptoms to an immunized host. Such responses can be 
determined using standard immunoassays and neutralization assays, well known in the 
art. 

5 A "coding sequence" or a sequence which "encodes" a selected polypeptide, is a 

nucleic acid molecule which is transcribed (in the case of DNA) and translated (in the 
case of mRNA) into a polypeptide in vitro or in vivo when placed under the control of 
appropriate regulatory sequences. The boundaries of the coding sequence are 
determined by a start codon at the 5' (amino) terminus and a translation stop codon at 

10 the 3' (carboxy) terminus. A transcription termination sequence may be located 3 f to 
the coding sequence. 

A "nucleic acid" molecule or "polynucleotide" can include both double- and 
single-stranded sequences and refers to, but is not limited to, cDNA from viral, 
procaryotic or eucaryotic mRNA, genomic DNA sequences from viral (e.g. DNA 

15 viruses and retroviruses) or procaryotic DNA, and especially synthetic DNA sequences. 
The term also captures sequences that include any of the known base analogs of DNA 
and RNA. 

"Operably linked" refers to an arrangement of elements wherein the 
components so described are configured so as to perform their desired function. Thus, 

c 

20 a given promoter operably linked to a coding sequence is capable of effecting the 
expression of the coding sequence when the proper transcription factors, etc., are 
present. The promoter need not be contiguous with the coding sequence, so long as it 
functions to direct the expression thereof. Thus, for example, intervening untranslated 
yet transcribed sequences can be present between the promoter sequence and the coding 

25 sequence, as can transcribed introns, and the promoter sequence can still be considered 
"operably linked" to the coding sequence. 

"Recombinant" as used herein to describe a nucleic acid molecule means a 
polynucleotide of genomic, cDNA, viral, semisynthetic, or synthetic origin which, by 
virtue of its origin or manipulation is not associated with all or a portion of the 

30 polynucleotide with which it is associated in nature. The term "recombinant" as used 
with respect to a protein or polypeptide means a polypeptide produced by expression of 
a recombinant polynucleotide. In general, the gene of interest is cloned and then 
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expressed in transformed organisms, as described further below. The host organism 
expresses the foreign gene to produce the protein under expression conditions. 

A "control element" refers to a polynucleotide sequence which aids in the 
expression of a coding sequence to which it is linked. The term includes promoters, 
5 transcription termination sequences, upstream regulatory domains, polyadenylation 
signals, untranslated regions, including 5'~UTRs and 3'-UTRs and when appropriate, 
leader sequences and enhancers, which collectively provide for the transcription and 
translation of a coding sequence in a host cell. 

A "promoter" as used herein is a DNA regulatory region capable of binding 

1 0 RNA polymerase in a host cell and initiating transcription of a downstream (3* 
direction) coding sequence operably linked thereto. For purposes of the present 
invention, a promoter sequence includes the minimum number of bases or elements 
necessary to initiate transcription of a gene of interest at levels detectable above 
background. Within the promoter sequence is a transcription initiation site, as well as 

15 protein binding domains (consensus sequences) responsible for the binding of RNA 
polymerase. Eucaryotic promoters will often, but not always, contain "TATA" boxes 
and "CAT" boxes. 

A control sequence "directs the transcription" of a coding sequence in a cell 
when RNA polymerase will bind the promoter sequence and transcribe the coding 
20 sequence into mRNA, which is then translated into the polypeptide encoded by the 
coding sequence. 

"Expression cassette" or "expression construct" refers to an assembly which is 
capable of directing the expression of the sequence(s) or gene(s) of interest. The 
expression cassette includes control elements, as described above, such as a promoter 

25 which is operably linked to (so as to direct transcription of) the sequence(s) or gene(s) 
of interest, and often includes a polyadenylation sequence as well. Within certain 
embodiments of the invention, the expression cassette described herein may be 
contained within a plasmid construct. In addition to the components of the expression 
cassette, the plasmid construct may also include, one or more selectable markers, a 

30 signal which allows the plasmid construct to exist as single-stranded DNA (e.g., a M13 
origin of replication), at least one multiple cloning site, and a "mammalian" origin of 
replication (e.g., a SV40 or adenovirus origin of replication). 
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"Transformation," as used herein, refers to the insertion of an exogenous 
polynucleotide into a host cell, irrespective of the method used for insertion: for 
example, transformation by direct uptake, transfection, infection, and the like. For 
particular methods of transfection, see further below. The exogenous polynucleotide 
5 may be maintained as a nonintegrated vector, for example, an episome, or alternatively, 
may be integrated into the host genome. 

A "host cell" is a cell which has been transformed, or is capable of 
transformation, by an exogenous DNA sequence. 

By "isolated" is meant, when referring to a polypeptide, that the indicated 
10 molecule is separate and discrete from the whole organism with which the molecule is 
found in nature or is present in the substantial absence of other biological macro- 
molecules of the same type. The term "isolated" with respect to a polynucleotide is a 
nucleic acid molecule devoid, in whole or part, of sequences normally associated with 
it in nature; or a sequence, as it exists in nature, but having heterologous sequences in 
1 5 association therewith; or a molecule disassociated from the chromosome. 

The term "purified" as used herein preferably means at least 75% by weight, 
more preferably at least 85% by weight, more preferably still at least 95% by weight, 
and most preferably at least 98% by weight, of biological macromolecules of the same 
type are present. 

20 "Homology" refers to the percent identity between two polynucleotide or two 

polypeptide moieties. Two DNA, or two polypeptide sequences are "substantially 
homologous" to each other when the sequences exhibit at least about 50% , preferably 
at least about 75%, more preferably at least about 80%-85%, preferably at least about 
90%, and most preferably at least about 95%-98%, or more, sequence identity over a 

25 defined length of the molecules. As used herein, substantially homologous also refers 
to sequences showing complete identity to the specified DNA or polypeptide sequence. 
The term "substantially homologous" as used herein in reference to ANS35 generally 
refers to an HCV nucleic or amino acid sequence that is at least 60% identical to the 
entire sequence of the polypeptide encoded by ANS35 (see FIG. 5), where the sequence 

30 identity is preferably at least 75%, more preferably at least 80%, still more preferably at 
least about 85%, especially more than about 90%, most preferably 95% or greater, 
particularly 98% or greater. These homologous polypeptides include fragments, 
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including mutants and allelic variants of the fragments. Identity between the two 
sequences is preferably determined by the Smith- Waterman homology search algorithm 
as implemented in the MPSRCH program (Oxford Molecular), using an affine gap 
search with parameters gap open penalty=\2 and gap extension penalty=\ . Thus, for 
5 example, the present invention includes an isolate which is 80% identical to a 

polypeptide encoded by ANS35. In some aspects of the invention, the polypeptide of 
the present invention is substantially homologous to the ANS35. 

In general, "identity" refers to an exact nucleotide-to-nucleotide or amino acid- 
to-amino acid correspondence of two polynucleotides or polypeptide sequences, 

1 0 respectively. Percent identity can be determined by a direct comparison of the 

sequence information between two molecules by aligning the sequences, counting the 
exact number of matches between the two aligned sequences, dividing by the length of 
the shorter sequence, and multiplying the result by 100. Readily available computer 
programs can be used to aid in the analysis, such as ALIGN, Dayhoff, MO. in Atlas of 

15 Protein Sequence and Structure M.O. Dayhoff ed., 5 Suppl. 3:353-358, National 

biomedical Research Foundation, Washington, DC, which adapts the local homology 
algorithm of Smith and Waterman Advances in Appl Math. 2:482-489, 1981 for 
peptide analysis. Programs for determining nucleotide sequence identity are available 
in the Wisconsin Sequence Analysis Package, Version 8 (available from Genetics 

20 Computer Group, Madison, WI) for example, the BESTFIT, FASTA and GAP 

programs, which also rely on the Smith and Waterman algorithm. These programs are 
readily utilized with the default parameters recommended by the manufacturer and 
described in the Wisconsin Sequence Analysis Package referred to above. For 
example, percent identity of a particular nucleotide sequence to a reference sequence 

25 can be determined using the homology algorithm of Smith and Waterman with a 
default scoring table and a gap penalty of six nucleotide positions. 

Another method of establishing percent identity in the context of the present 
invention is to use the MPSRCH package of programs copyrighted by the University of 
Edinburgh, developed by John F. Collins and Shane S. Sturrok, and distributed by 

30 IntelliGenetics, Inc. (Mountain View, CA). From this suite of packages the Smith- 
Waterman algorithm can be employed where default parameters are used for the 
scoring table (for example, gap open penalty of 12, gap extension penalty of one, and a 
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gap of six). From the data generated the "Match" value reflects "sequence identity." 
Other suitable programs for calculating the percent identity or similarity between 
sequences are generally known in the art, for example, another alignment program is 
BLAST, used with default parameters. For example, BLASTN and BLASTP can be 
5 used using the following default parameters: genetic code = standard; filter = none; 
strand = both; cutoff = 60; expect = 10; Matrix = BLOSUM62; Descriptions = 50 
sequences; sort by = HIGH SCORE; Databases = non-redundant, GenBank + EMBL + 
DDBJ + PDB + GenBank CDS translations + Swiss protein + Spupdate + PIR. Details 
of these programs can be found at the following internet address: 

10 http://www.ncbi.nlm.gov/cgi-bin/BLAST. 

Alternatively, homology can be determined by hybridization of polynucleotides 
under conditions which form stable duplexes between homologous regions, followed 
by digestion with single-stranded-specific nuclease(s), and size determination of the 
digested fragments. DNA sequences that are substantially homologous can be 

15 identified in a Southern hybridization experiment under, for example, stringent 

conditions, as defined for that particular system. Defining appropriate hybridization 
conditions is within the skill of the art. See, e.g., Sambrook et al., supra; DNA Clonings 
supra; Nucleic Acid Hybridization, supra. 

"Stringency" refers to conditions in a hybridization reaction that favor 

20 association of very similar sequences over sequences that differ. For example, the 
combination of temperature and salt concentration should be chosen that is 
approximately 120 to 200°C below the calculated Tm of the hybrid under study. The 
temperature and salt conditions can often be determined empirically in preliminary 
experiments in which samples of genomic DNA immobilized on filters are hybridized 

25 to the sequence of interest and then washed under conditions of different stringencies. 
See Sambrook et al at page 9.50. 

Variables to consider when performing, for example, a Southern blot are (1) the 
complexity of the DNA being blotted and (2) the homology between the probe and the 
sequences being detected. The total amount of the fragment(s) to be studied can vary a 

30 magnitude of 1 0, from 0. 1 to 1 jig for a plasmid or phage digest to 1 0* 9 to 1 0" 8 g for a 
single copy gene in a highly complex eukaryotic genome. For lower complexity 
polynucleotides, substantially shorter blotting, hybridization, and exposure times, a 
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smaller amount of starting polynucleotides, and lower specific activity of probes can be 
used. For example, a single-copy yeast gene can be detected with an exposure time of 
only 1 hour starting with 1 jig of yeast DNA, blotting for two hours, and hybridizing 
for 4-8 hours with a probe of 10 8 cpm/jig. For a single-copy mammalian gene a 
5 conservative approach would start with 10 |ig of DNA, blot overnight, and hybridize 
overnight in the presence of 10% dextran sulfate using a probe of greater than 10 8 
cpm/^g, resulting in an exposure time of ~24 hours. 

Several factors can affect the melting temperature (Tm) of a DNA-DNA hybrid 
between the probe and the fragment of interest, and consequently, the appropriate 

10 conditions for hybridization and washing. In many cases the probe is not 100% 
homologous to the fragment. Other commonly encountered variables include the 
length and total G+C content of the hybridizing sequences and the ionic strength and 
formamide content of the hybridization buffer. The effects of all of these factors can be 
approximated by a single equation: 

15 Tm= 81 + 16.6(log I0 Ci) + 0.4[%(G + C)]-0.6(%formamide) - 600/h-I .5(%mismatch). 
where Ci is the salt concentration (monovalent ions) and n is the length of the hybrid in 
base pairs (slightly modified from Meinkoth & Wahl (1 984) Anal Biochem. 138: 267- 
284). In general, convenient hybridization temperatures in the presence of 50% 
formamide are 42°C for a probe with is 95% to 100% homologous to the target 

20 fragment, 37°C for 90% to 95% homology, and 32°C for 85% to 90% homology. For 
lower homologies, formamide content should be lowered and temperature adjusted 
accordingly, using the equation above. If the homology between the probe and the 
target fragment are not known, the simplest approach is to start with both hybridization" 
and wash conditions which are nonstringent If non-specific bands or high background 

25 are observed after autoradiography, the filter can be washed at high stringency and 
reexposed. If the time required for exposure makes this approach impractical, several 
hybridization and/or washing stringencies should be tested in parallel. 

By "nucleic acid immunization" is meant the introduction of a nucleic acid 
molecule encoding one or more selected antigens into a host cell, for the in vivo 

30 expression of the antigen or antigens. The nucleic acid molecule can be introduced 
directly into the recipient subject, such as by injection, inhalation, oral, intranasal and 
mucosal administration, or the like, or can be introduced ex vivo, into cells which have 
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into the subject where an immune response can be mounted against the antigen encoded 
by the nucleic acid molecule. 

An "open reading frame" or ORF is a region of a polynucleotide sequence 
5 which encodes a polypeptide; this region can represent a portion of a coding sequence 
or a total coding sequence. 

As used herein, the term "antibody" refers to a polypeptide or group of 
polypeptides which comprise at least one antigen binding site. An "antigen binding 
site" is formed from the folding of the variable domains of an antibody molecule(s) to 

10 form three-dimensional binding sites with an internal surface shape and charge 

distribution complementary to the features of an epitope of an antigen, which allows 
specific binding to form an antibody-antigen complex. An antigen binding site may be 
formed from a heavy- and/or light-chain domain (VH and VL, respectively), which 
form hypervariable loops which contribute to antigen binding. The term "antibody" 

15 includes, without limitation, polyclonal antibodies, monoclonal antibodies, chimeric 
antibodies, altered antibodies, univalent antibodies, Fab proteins, and single-domain 
antibodies. In many cases, the binding phenomena of antibodies to antigens is 
equivalent to other ligand/anti-ligand binding. 

If polyclonal antibodies are desired, a selected mammal (e.g., mouse, rabbit, 

20 goat, horse, etc.) is immunized with an immunogenic polypeptide bearing an HCV 
epitope(s). Serum from the immunized animal is collected and treated according to 
known procedures. If serum containing polyclonal antibodies to an HCV epitope 
contains antibodies to other antigens, the polyclonal antibodies can be purified by 
immunoaffinity chromatography. Techniques for producing and processing polyclonal 

25 antisera are known in the art, see for example, Mayer and Walker, eds. (1987) 

IMMUNOCHEMICAL METHODS IN CELL AND MOLECULAR BIOLOGY 
(Academic Press, London). 

Monoclonal antibodies directed against HCV epitopes can also be readily 
produced by one skilled in the art. The general methodology for making monoclonal 

30 antibodies by hybridomas is well known. Immortal antibody-producing cell lines can 
be created by cell fusion, and also by other techniques such as direct transformation of 
B lymphocytes with oncogenic DNA, or transfection with Epstein-Barr virus. See, e.g., 
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M. Schreier et al. (1980) HYBRIDOMA TECHNIQUES; Hammerling et al. (1981), 
MONOCLONAL ANTIBODIES AND T-CELL HYBRIDOMAS; Kennett et al. 
(1980) MONOCLONAL ANTIBODIES; see also, U.S. Pat. Nos. 4,341,761; 4,399,121; 
4,427,783; 4,444,887; 4,466,917; 4,472,500; 4,491,632; and 4,493,890. Panels of 
5 monoclonal antibodies produced against HCV epitopes can be screened for various 
properties; i.e., for isotype, epitope affinity, etc. As used herein, a "single domain 
antibody" (dAb) is an antibody which is comprised of an HL domain, which binds 
specifically with a designated antigen. A dAb does not contain a VL domain, but may 
contain other antigen binding domains known to exist to antibodies, for example, the 

10 kappa and lambda domains. Methods for preparing dabs are known in the art. See, for 
example, Ward et al, Nature 341: 544 (1989). 

Antibodies can also be comprised of VH and VL domains, as well as other 
known antigen binding domains. Examples of these types of antibodies and methods 
for their preparation and known in the art (see, e.g., U.S. Pat. No. 4,816,467), and 

15 include the following. For example, "vertebrate antibodies" refers to antibodies which 
are tetramers or aggregates thereof, comprising light and heavy chains which are 
usually aggregated in a "Y" configuration and which may or may not have covalent 
linkages between the chains. In vertebrate antibodies, the amino acid sequences of the 
chains are homologous with those sequences found in antibodies produced in 

20 vertebrates, whether in situ or in vitro (for example, in hybridomas). Vertebrate 
antibodies include, for example, purified polyclonal antibodies and monoclonal 
antibodies, methods for the preparation of which are described infra. 

"Hybrid antibodies" are antibodies where chains are separately homologous 
with reference to mammalian antibody chains and represent novel assemblies of them, 

25 so that two different antigens are precipitable by the tetramer or aggregate. In hybrid 
antibodies, one pair of heavy and light chains are homologous to those found in an 
antibody raised against a first antigen, while a second pair of chains are homologous to 
those found in an antibody raised against a second antibody. This results in the property 
of "di valence", i.e., the ability to bind two antigens simultaneously. Such hybrids can 

30 also be formed using chimeric chains, as set forth below. 

"Chimeric antibodies" refers to antibodies in which the heavy and/or light 
chains are fusion proteins. Typically, one portion of the amino acid sequences of the 
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chain is homologous to corresponding sequences in an antibody derived from a 
particular species or a particular class, while the remaining segment of the chain is 
homologous to the sequences derived from another species and/or class. Usually, the 
variable region of both light and heavy chains mimics the variable regions or antibodies 
5 derived from one species of vertebrates, while the constant portions are homologous to 
the sequences in the antibodies derived from another species of vertebrates. However, 
the definition is not limited to this particular example. Also included is any antibody in 
which either or both of the heavy or light chains are composed of combinations of 
sequences mimicking the sequences in antibodies of different sources, whether these 

10 sources be from differing classes or different species of origin, and whether or not the 
fusion point is at the variable/constant boundary. Thus, it is possible to produce 
antibodies in which neither the constant nor the variable region mimic know antibody 
sequences. It then becomes possible, for example, to construct antibodies whose 
variable region has a higher specific affinity for a particular antigen, or whose constant 

1 5 region can elicit enhanced complement fixation, or to make other improvements in 
properties possessed by a particular constant region. 

Another example is "altered antibodies", which refers to antibodies in which the 
naturally occurring amino acid sequence in a vertebrate antibody has been varies. 
Utilizing recombinant DNA techniques, antibodies can be redesigned to obtain desired 

20 characteristics. The possible variations are many, and range from the changing of one 
or more amino acids to the complete redesign of a region, for example, the constant 
region. Changes in the constant region, in general, to attain desired cellular process 
characteristics, e.g., changes in complement fixation, interaction with membranes, and 
other effector functions. Changes in the variable region can be made to alter antigen 

25 binding characteristics. The antibody can also be engineered to aid the specific delivery 
of a molecule or substance to a specific cell or tissue site. The desired alterations can be 
made by known techniques in molecular biology, e.g., recombinant techniques, site- 
directed mutagenesis, etc. 

Yet another example are "univalent antibodies", which are aggregates 

30 comprised of a heavy-chain/light-chain dimer bound to the Fc (i.e., stem) region of a 
second heavy chain. This type of antibody escapes antigenic modulation. See, e.g., 
Glennie et al. Nature 295: 712 (1982). Included also within the definition of antibodies 
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are "Fab" fragments of antibodies. The "Fab" region refers to those portions of the 
heavy and light chains which are roughly equivalent, or analogous, to the sequences 
which comprise the branch portion of the heavy and light chains, and which have been 
shown to exhibit immunological binding to a specified antigen, but which lack the 
5 effector Fc portion. "Fab" includes aggregates of one heavy and one light chain 
(commonly known as Fab'), as well as tetramers containing the 2H and 2L chains 
(referred to as F(ab)2), which are capable of selectively reacting with a designated 
antigen or antigen family. Fab antibodies can be divided into subsets analogous to those 
described above, i.e., "vertebrate Fab", "hybrid Fab", "chimeric Fab", and "altered Fab". 
10 Methods of producing Fab fragments of antibodies are known within the art and 
include, for example, proteolysis, and synthesis by recombinant techniques. 

"Antigen-antibody complex" refers to the complex formed by an antibody that 
is specifically bound to an epitope on an antigen. 

"Immunogenic polypeptide" refers to a polypeptide that elicits a cellular and/or 
15 humoral immune response in a mammal, whether alone or linked to a carrier, in the 
presence or absence of an adjuvant. 

"Antigenic determinant" refers to the site on an antigen or hapten to which a 
specific antibody molecule or specific cell surface receptor binds. 

As used herein, "treatment" refers to any of (i) the prevention of infection or 
20 reinfection, as in a traditional vaccine, (ii) the reduction or elimination of symptoms, 

and (iii) the substantial or complete elimination of the pathogen in question. Treatment 
may be effected prophylactically (prior to infection) or therapeutically (following 
infection). 

By "vertebrate subject" is meant any member of the subphylum cordata, 
25 including, without limitation, humans and other primates, including non-human 

primates such as chimpanzees and other apes and monkey species; farm animals such 
as cattle, sheep, pigs, goats and horses; domestic mammals such as dogs and cats; 
laboratory animals including rodents such as mice, rats and guinea pigs; birds, 
including domestic, wild and game birds such as chickens, turkeys and other 
30 gallinaceous birds, ducks, geese, and the like. The term does not denote a particular 
age. Thus, both adult and newborn individuals are intended to be covered. The 
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invention described herein is intended for use in any of the above vertebrate species, 
since the immune systems of all of these vertebrates operate similarly. 

II. Modes of Carrying out the Invention 
5 Before describing the present invention in detail, it is to be understood that this 

invention is not limited to particular formulations or process parameters as such may, of 
course, vary. It is also to be understood that the terminology used herein is for the 
purpose of describing particular embodiments of the invention only, and is not intended 
to be limiting. 

10 Although a number of compositions and methods similar or equivalent to those 

described herein can be used in the practice of the present invention, the preferred 
materials and methods are described herein. 

General Overview 

1 5 An aim of an HCV vaccine is to generate broad immunity to a wide breadth of 

antigens because HCV is so divergent and because humoral as well as cellular immune 
responses are desirable to combat this human pathogen. While antibodies generated 
against the envelope glycoprotein(s) might aid in virus neutralization, there is 
additional benefit to be derived from a vaccine that includes other regions. The 

20 likelihood of T-helper responses generated against a polypeptide would be helpful in a 
vaccine setting as would generation of cytotoxic T cells. The non-structural region 
represents such a candidate antigen, but processing by the protease generates several 
polypeptides, making purification complicated. It would be advantageous, therefore, to 
derive a non-structural cassette that is unprocessed by the NS3 protease. 

25 The present invention solves this and other problems using compositions and 

methods involving an N-terminal deletion in NS3, which removes the catalytic domain. 
As such, some or all of the remainder of the non-structural region (through NS5B) is 
expressed as an intact polypeptide. Expression of this species has been documented in 
mammalian cells as well as in yeast. Further, in certain aspects, polynucleotides 

30 encoding HCV core polypeptides (or fragments thereof) are added (e.g,. operably 
linked) to the carboxy-terminus of the non-structural cassette. As the core coding 
region is relatively highly conserved among HCV isolates, the presence of this region 
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may enhance the immune response. Because core has at its C-terminus a very 
hydrophobic domain (amino acids 174-191), shorter versions of core were also 
engineered onto the polypeptide. As described in detail herein, the truncation of core to 
amino acid 121 yielded higher expression than the amino acid 173 truncation when 
5 engineered onto the C-terminus of the mutant NS polypeptide. The combination of 
most of the non-structural region fused to a C-terminally truncated core into a 
polypeptide is novel and has advantages for vaccine immunization. Moreover, because 
the aim is not necessarily to generate antibody responses to this polypeptide, there is no 
need to maintain a native conformation, enabling a more facile purification protocol. 

10 

Mutant HCV Non-Structural Polypeptides 

Genomes of HCV strains contain a single open reading frame of approximately 
9,000 to 12,000 nucleotides, which is transcribed into a polyprotein. An HCV 
polyprotein is cleaved to produce at least ten distinct products, in the order of NH 2 - 

15 Core-El-E2-p7-NS2-NS3-NS4a-NS4b-NS5a-NS5b-COOH. Mutant HCV 

polypeptides of the invention contain an N-terminal deletion in NS3, which removes or 
disables the catalytic domain. Preferably, the polypeptides also include the remainder 
of the non-structural region, although in certain embodiments, the polypeptides may 
include less than all of the remaining NS polypeptides, for example mutant NS 

20 polypeptides including any combinations of NS2-NS3-NS4a-NS4b-NS5a-NS5b (e.g., 
NS3NS3-NS5a-NS5b; NS3-NS4a-NS4b; NS3-NS4a-NS4b-NS5a; NS3-NS4b-NS5a- 
NS5b; NS3-NS4a-NS5a; NS3-NS4b-NS5a; NS3-NS4b-NS5b; etc.). 

The HCVNS3 protein functions as a protease and a helicase and occurs at 
approximately amino acid 1027 to amino acid 1657 of the polyprotein (numbered 

25 relative to HCV-1). See Choo et al (1991) Proc. Natl. Acad. Sci. USA 88:2451-2455. 
HCV NS4 occurs at approximately amino acid 1658 to amino acid 1972, NS5a occurs 
at approximately amino acid 1973 to amino acid 2420, and HCV NS5b occurs at 
approximately amino acid 2421 to amino acid 301 1 of the polyprotein (numbered 
relative to HCV-1) (Choo et al. 9 1991). 

30 The mutant polypeptides described herein can either be full-length polypeptides 

or portions of NS3, NS4 (NS4a and NS4b), NS5a, and NS5b polypeptides. Epitopes of 
NS3, NS4 (NS4a and NS4b), NS5a, NS5b, NS3NS4NS5a, and NS3NS4NS5aNS5b can 
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be identified by several methods. For example, NS3, NS4, NS5a, NS5b polypeptides 
or fusion proteins comprising any combination of the above, can be isolated, for 
example, by immunoaffinity purification using a monoclonal antibody for the 
polypeptide or protein. The isolated protein sequence can then be screened by 
5 preparing a series of short peptides by proteolytic cleavage of the purified protein, 
which together span the entire protein sequence. By starting with, for example, 
100-mer polypeptides, each polypeptide can be tested for the presence of epitopes 
recognized by a T cell receptor on an HCV-activated T cell, progressively smaller and 
overlapping fragments can then be tested from an identified 100-mer to map the epitope 
10 of interest. 

Epitopes recognized by a T cell receptor on an HCV-activated T cell can be 
identified by, for example, 51 Cr release assay (see Example 2) or by 
lymphoproliferation assay (see Example 4). In a 51 Cr release assay, target cells can be 
constructed that display the epitope of interest by cloning a polynucleotide encoding the 

1 5 epitope into an expression vector and transforming the expression vector into the target 
cells. Non-structural polypeptides can occur in any order in the fusion protein. If 
desired, at least 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more of one or more of the polypeptides 
may occur in the fusion protein. Multiple viral strains of HCV occur, and NS3, NS4, 
NS5a, and NS5b polypeptides of any of these strains can be used in a fusion protein. 

20 Nucleic acid and amino acid sequences of a number of HCV strains and 

isolates, including nucleic acid and amino acid sequences of NS3, NS4, NS5a, NS5b 
genes and polypeptides have been determined. For example, isolate HCV Jl . 1 is 
described in Kubo et al (1989) Japan. Nucl. Acids Res. 17:10367-10372; Takeuchi et < 
a/.(1990) Gene 91:287-291; Takeuchi et al (1990) J. Gen. Virol. 71:3027-3033; and 

25 Takeuchi et al (1990) Nucl. Acids Res. 18:4626. The complete coding sequences of 
two independent isolates, HCV- J and BK, are described by Kato et al, (1990) Proc. 
Natl. Acad. Sci. USA 87:9524-9528 and Takamizawa et al, (1991) J. ViroL 
65: 1 105-1 113 respectively. 

Publications that describe HCV-1 isolates include Choo et al (1990) Brit. Med. 

30 Bull. 46:423-441; Choo et al (1991) Proc. Natl. Acad. Sci. USA 88:2451-2455 and 
Han et al (1991) Proc. Natl. Acad. Sci. USA 88:171 1-1715. HCV isolates HC-J1 and 
HC-J4 are described in Okamoto et al (1991) Japan J. Exp. Med. 60:167-177. HCV 
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isolates HCT 18-, HCT 23, Th, HCT 27, EC1 and EC10 are described in Weiner et al 
(1991) Virol. 180:842-848. HCV isolates Pt-1, HCV-K1 and HCV-K2 are described in 
Enomoto etal (1990) Biochem. Biophys. Res. Commun. 170:1021-1025. HCV 
isolates A, C, D & E are described in Tsukiyama-Kohara et al (1991) Virus Genes 
5 5:243-254. 

Each of the mutant HCV polypeptides containing at least portions of NS3, NS4 
and NS5 can be obtained from the same HCV strain or isolate or from different HCV 
strains or isolates. Thus, each non-structural region of the polypeptide can be from the 
same HCV strain or isolate or from each different HCV strains or isolates. In addition 

10 to the mutant HCV non-structural polypeptides described herein, the proteins can 

contain other polypeptides derived from the HCV polyprotein. For example, it may be 
desirable to include polypeptides derived from the core region of the HCV polyprotein. 
This region occurs at amino acid positions 1-191 of the HCV polyprotein, numbered 
relative to HCV-1. Either the full-length protein or epitopes of the full-length protein 

15 may be used in the subject fusions, such as those epitopes found between amino acids 
10-53, amino acids 10-45, amino acids 67-88, amino acids 120-130, or any of the core 
epitopes identified in, e.g., Houghton et al., U.S. Patent No. 5,350,671; Chien et al., 
Proc, Natl. Acad. Sci. USA (1992) 89:1001 1-10015; Chien et al., J. Gastroent. Hepatol 
(1993) 8:S33-39; Chien et al., International Publication No. WO 93/00365; Chien, 

20 D.Y., International Publication No. WO 94/01778; and commonly owned, U.S. Patent 
No. 6,150,087. When present, additional non-structural HCV polypeptides such as core 
can be obtained from the same HCV strain or isolate or from different HCV strains or 
isolates. 

Preferably, the above-described mutant proteins, as well as the individual 
25 components of these proteins, are produced recombinantly. A polynucleotide encoding 
these proteins can be introduced into an expression vector which can be expressed in a 
suitable expression system. A variety of bacterial, yeast, mammalian, insect and plant 
expression systems are available in the art and any such expression system can be used. 
Optionally, a polynucleotide encoding these proteins can be translated in a cell-free 
30 translation system. Such methods are well known in the art. The proteins also can be 
constructed by solid phase protein synthesis. 
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If desired, the mutant polypeptides, or the individual components of these 
polypeptides, also can contain other amino acid sequences, such as amino acid linkers 
or signal sequences, as well as ligands useful in protein purification, such as 
glutathione-S-transferase and staphylococcal protein A. 

5 

Polynucleotides 

The polynucleotides of the present invention are not necessarily physically 
derived from the nucleotide sequences shown, but can be generated in any manner, 
including, for example, chemical synthesis or DNA replication or reverse transcription 
10 or transcription. In addition, combinations of regions corresponding to that of the 

designated sequences can be modified in ways known to the art to be consistent with an 
intended use. 

The DNA encoding the desired polypeptide, whether in fused or mature form, 
and whether or not containing a signal sequence to permit secretion, can be ligated into 

15 expression vectors suitable for any convenient host. Both eukaryotic and prokaryotic 
host systems are presently used in forming recombinant polypeptides, and a summary 
of some of the more common control systems and host cell is given below. The 
polypeptide produced in such host cells is then isolated from lysed cells or from the 
culture medium and purified to the extent needed for its intended use. 

20 Purification can be by techniques known in the art, for example, differential 

extraction, salt fractionation, chromatography on ion exchange resins, affinity 
chromatography, centrifugation, alkali resolubilization of insoluble protein, and the 
i. ~ like. See, for example, Methods in Enzymology for a variety of methods for purifying 
proteins. 

25 Polynucleotides contain less than an entire HCV genome and can be RNA or 

single- or double-stranded DNA. Preferably, the polynucleotides are isolated free of 
other components, such as proteins and lipids. Polynucleotides of the invention can 
also comprise other nucleotide sequences, such as sequences coding for linkers, signal 
sequences, or ligands useful in protein purification such as glutathione-S-transferase 

30 and staphylococcal protein A. 

Polynucleotides encoding mutant HCV non-structural polypeptides can be 
isolated from a genomic library derived from nucleic acid sequences present in, for 
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example, the plasma, serum, or liver homogenate of an HC V infected individual or can 
be synthesized in the laboratory, for example, using an automatic synthesizer. An 
amplification method such as PCR can be used to amplify polynucleotides from either 
HCV genomic DNA or cDNA. 
5 Further, while the polypeptides that are not NS3, NS4, or NS5 of HCV of the 

present invention can comprise a substantially complete viral domain, in many 
applications all that is required is that the polypeptide comprise an antigenic or 
immunogenic region of the virus. An antigenic region of a polypeptide is generally 
relatively small-typically 8 to 10 amino acids or less in length. Fragments of as few as 5 

10 amino acids can characterize an antigenic region. These segments can correspond to 

regions of, for example, C, El, or E2 epitopes. Accordingly, using the cDNAs of C, El, 
or E2 as a basis, DNAs encoding short segments of C, El, or E2 polypeptides can be 
expressed recombinantly either as fusion proteins, or as isolated polypeptides. In 
addition, short amino acid sequences can be conveniently obtained by chemical 

15 synthesis. 

Polynucleotides encoding the polypeptides described herein can comprise 
coding sequences for these polypeptides which occur naturally or can be artificial 
sequences which do not occur in nature. These polynucleotides can be ligated to form a 
coding sequence for the fusion proteins using standard molecular biology techniques. 

20 If desired, polynucleotides can be cloned into an expression vector and transformed 
into, for example, bacterial, yeast, insect, plant or mammalian cells so that the fusion 
proteins of the invention can be expressed in and isolated from a cell culture. 

The expression of polypeptides containing these domains in a variety of 
recombinant host cells, including, for example, bacteria, yeast, insect, plant and 

25 vertebrate cells, give rise to important immunological reagents which can be used for 
diagnosis, detection, and vaccines. 

The general techniques used in extracting the genome from a virus, preparing 
and probing a cDNA library, sequencing clones, constructing expression vectors, 
transforming cells, performing immunological assays such as radioimmunoassays and. 

30 ELISA assays, for growing cells in culture, and the like are known in the art and 

laboratory manuals are available describing these techniques. However, as a general 
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guide, the following sets forth some sources currently available for such procedures, 
and for materials useful in carrying them out. 

Both prokaryotic and eukaryotic host cells may be used for expression of 
desired coding sequences when appropriate control sequences which are compatible 
5 with the designated host are used. Among prokaryotic hosts, E. coli is most frequently 
used. Expression control sequences for prokaryotes include promoters, optionally 
containing operator portions, and ribosome binding sites. Transfer vectors compatible 
with prokaryotic hosts are commonly derived from, for example, pBR322, a plasmid 
containing operons conferring ampicillin and tetracycline resistance, and the various 

10 pUC vectors, which also contain sequences conferring antibiotic resistance markers. 

These markers may be used to obtain successful transformants by selection. Commonly 
used prokaryotic control sequences include the Beta-lactamase (penicillinase) and 
lactose promoter systems (Chang et al. (1977), Nature 198:1056), the tryptophan (trp) 
promoter system (Goeddel et al. (1980) Nucleic Acid Res. 8:4057), the lambda-derived 

15 P[L ]promoter and N gene ribosome binding site (Shimatake et al. (1981) Nature 
292:128) and the hybrid tac promoter (De Boer et al. (1983) Proc. Natl. Acad. Sci. 
U.S.A. 292:128) derived from sequences of the trp and lac UV5 promoters. The 
foregoing systems are particularly compatible with E. coli; if desired, other prokaryotic 
hosts such as strains of Bacillus or Pseudomonas may be used, with corresponding 

20 control sequences. 

Eukaryotic hosts include mammalian and yeast cells in culture systems. 
Mammalian cell lines available as hosts for expression are known in the art and include 
many immortalized cell lines available from the American Type Culture Collection 
(ATCC), including HeLa cells, Chinese hamster ovary (CHO) cells, baby hamster 

25 kidney (BHK) cells, and a number of other cell lines. Suitable promoters for 

mammalian cells are also known in the art and include viral promoters such as that 
from Simian Virus 40 (SV40) (Fiers (1978), Nature 273: 1 13), Rous sarcoma virus 
(RSV), adenovirus (ADV), and bovine papilloma virus (BPV). Mammalian cells may 
also require terminator sequences and poly A addition sequences; enhancer sequences 

30 which increase expression may also be included, and sequences which cause 

amplification of the gene may also be desirable. These sequences are known in the art. 
Vectors suitable for replication in mammalian cells may include viral replicons, or 
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sequences which insure integration of the appropriate sequences encoding NANBV 
epitopes into the host genome. 

The vaccinia virus system can also be used to express foreign DNA in 
mammalian cells. To express heterologous genes, the foreign DNA is usually inserted 
5 into the thymidine kinase gene of the vaccinia virus and then infected cells can be 
selected. This procedure is known in the art and further information can be found in 
these references (Mackett et al. J, Virol. 49: 857-864 (1984) and Chapter 7 in DNA 
Cloning, Vol. 2, IRL Press). 

Yeast expression systems are also known to one of ordinary skill in the art. A 

10 yeast promoter is any DNA sequence capable of binding yeast RNA polymerase and 
initiating the downstream (3 1 ) transcription of a coding sequence (e.g., structural gene) 
into mRNA. A promoter will have a transcription initiation region which is usually 
placed proximal to the 5 1 end of the coding sequence. This transcription initiation 
region usually includes an RNA polymerase binding site (the "TATA Box") and a 

15 transcription initiation site. A yeast promoter may also have a second domain called an 
upstream activator sequence (UAS), which, if present, is usually distal to the structural 
gene. The UAS permits regulated (inducible) expression. Constitutive expression 
occurs in the absence of a UAS. Regulated expression may be either positive or 
negative, thereby either enhancing or reducing transcription. 

20 Yeast is a fermenting organism with an active metabolic pathway, therefore 

sequences encoding enzymes in the metabolic pathway provide particularly useful 
promoter sequences. Examples include alcohol dehydrogenase (ADH) (EP-A-0 284 
044), enolase, glucokinase, glucose-6-phosphate isomerase, glyceraidehyde-3- 
phosphate-dehydrogenase (GAP or GAPDH), hexokinase, phosphofructokinase, 3- 

25 phosphoglycerate mutase, and pyruvate kinase (PyK) (EPO-A-0 329 203). The yeast 
PHOS gene, encoding acid phosphatase, also provides useful promoter sequences 
(Myanoharaetfa/. (1983) Proc. Natl. Acad Set USA 80:1). 

In addition, synthetic promoters which do not occur in nature also function as 
yeast promoters. For example, UAS sequences of one yeast promoter may be joined 

30 with the transcription activation region of another yeast promoter, creating a synthetic 
hybrid promoter. Examples of such hybrid promoters include the ADH regulatory 
sequence linked to the GAP transcription activation region (US Patent Nos. 4,876,197 
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and 4,880,734). Other examples of hybrid promoters include promoters which consist 
of the regulatory sequences of either the ADH2, GAL4, GAL10 y OR PH05 genes, 
combined with the transcriptional activation region of a glycolytic enzyme gene such as 
GAP or PyK (EP-A-0 164 556). Furthermore, a yeast promoter can include naturally 
5 occurring promoters of non-yeast origin that have the ability to bind yeast RNA 

polymerase and initiate transcription. Examples of such promoters include, inter alia, 
(Cohen e/ al (1980) Proc. Natl Acad. Sci. USA 77:1078; Henikoffe/ al (1981) 
Nature 253:835; Hollenberge/ al (1981) Curr. Topics Microbiol Immunol 96\\\9\ 
Hollenberg et al (1979) 'The Expression of Bacterial Antibiotic Resistance Genes in 

10 the Yeast Saccharomyces cerevisiae," in: Plasmids of Medical, Environmental and 

Commercial Importance (eds. K.N. TimmisandA. Puhler); Mercerau-Puigalon et al 
(1980)Ge*e/7:163;Panthiere/a/. (1980) Curr. Genet 2:109). 

A DNA molecule may be expressed intracellularly in yeast. A promoter 
sequence may be directly linked with the DNA molecule, in which case the first amino 

1 5 acid at the N-terminus of the recombinant protein will always be a methionine, which is 
encoded by the ATG start codon. If desired, methionine at the N-terminus may be 
cleaved from the protein by in vitro incubation with cyanogen bromide. 

Fusion proteins provide an alternative for yeast expression systems, as well as 
in mammalian, baculovirus, and bacterial expression systems. Usually, a DNA 

20 sequence encoding the N-terminal portion of an endogenous yeast protein, or other 
stable protein, is fused to the 5' end of heterologous coding sequences. Upon 
expression, this construct will provide a fusion of the two amino acid sequences. For 
v, . example, the yeast or human superoxide dismutase (SOD) gene, can be linked at the 5' 
terminus of a foreign gene and expressed in yeast. The DNA sequence at the junction 

25 of the two amino acid sequences may or may not encode a cleavable site. See e.g., EP- 
A-0 196 056. Another example is a ubiquitin fusion protein. Such a fusion protein is 
made with the ubiquitin region that preferably retains a site for a processing enzyme 
(e.g., ubiquitin-specific processing protease) to cleave the ubiquitin from the foreign 
protein. Through this method, therefore, native foreign protein can be isolated (e.g., 

30 WO88/024066). 

Alternatively, foreign proteins can also be secreted from the cell into the growth 
media by creating chimeric DNA molecules that encode a fusion protein comprised of a 
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leader sequence fragment that provide for secretion in yeast of the foreign protein. 
Preferably, there are processing sites encoded between the leader fragment and the 
foreign gene that can be cleaved either in vivo or in vitro. The leader sequence 
fragment usually encodes a signal peptide comprised of hydrophobic amino acids 
5 which direct the secretion of the protein from the cell. 

DNA encoding suitable signal sequences can be derived from genes for secreted 
yeast proteins, such as the yeast invertase gene (EP-A-0 012 873; JPO. 62,096,086) 
and the A-factor gene (US patent 4,588,684). Alternatively, leaders of non-yeast 
origin, such as an interferon leader, exist that also provide for secretion in yeast (EP-A- 
10 0 060 057). 

A preferred class of secretion leaders are those that employ a fragment of the 
yeast alpha-factor gene, which contains both a "pre" signal sequence, and a "pro" 
region. The types of alpha-factor fragments that can be employed include the full- 
length pre-pro alpha factor leader (about 83 amino acid residues) as well as truncated 

15 alpha- factor leaders (usually about 25 to about 50 amino acid residues) (US Patents 
4,546,083 and 4,870,008; EP-A-0 324 274). Additional leaders employing an alpha- 
factor leader fragment that provides for secretion include hybrid alpha-factor leaders 
made with a presequence of a first yeast, but a pro-region from a second yeast 
alphafactor. {e.g., see WO 89/02463.) 

20 Usually, transcription termination sequences recognized by yeast are regulatory 

regions located 3 f to the translation stop codon, and thus together with the promoter 
flank the coding sequence. These sequences direct the transcription of an mRNA 
which can be translated into the polypeptide encoded by the DNA. Examples of 
transcription terminator sequence and other yeast-recognized termination sequences, 

25 such as those coding for glycolytic enzymes. 

Usually, the above described components, comprising a promoter, leader (if 
desired), coding sequence of interest, and transcription termination sequence, are put 
together into expression constructs. Expression constructs are often maintained in a 
replicon, such as an extrachromosomal element (e.g., plasmids) capable of stable 

30 maintenance in a host, such as yeast or bacteria. The replicon may have two replication 
systems, thus allowing it to be maintained, for example, in yeast for expression and in a 
prokaryotic host for cloning and amplification. Examples of such yeast-bacteria shuttle 
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vectors include YEp24 (Botstein et al (1979) Gene 5:17-24), pCl/1 (Brake et al 
(1984) Proc. Natl. Acad. Sci USA 57:4642-4646), and YRpl7 (Stinchcomb et al 
(1982) J. Mol. Biol 755:157). In addition, a replicon may be either a high or low 
copy number plasmid. A high copy number plasmid will generally have a copy number 
5 ranging from about 5 to about 200, and usually about 10 to about 150. A host 

containing a high copy number plasmid will preferably have at least about 10, and more 
preferably at least about 20. Enter a high or low copy number vector may be selected, 
depending upon the effect of the vector and the foreign protein on the host. See e.g., 
Brake et al, supra. 

10 Alternatively, the expression constructs can be integrated into the yeast genome 

with an integrating vector. Integrating vectors usually contain at least one sequence 
homologous to a yeast chromosome that allows the vector to integrate, and preferably 
contain two homologous sequences flanking the expression construct. Integrations 
appear to result from recombinations between homologous DNA in the vector and the 

15 yeast chromosome (Orr- Weaver et al (1983) Methods in Enzymol 707:228-245). An 
integrating vector may be directed to a specific locus in yeast by selecting the 
appropriate homologous sequence for inclusion in the vector. See Orr- Weaver et al, 
supra. One or more expression construct may integrate, possibly affecting levels of 
recombinant protein produced (Rine et al (1983) Proc. Natl Acad. Sci. USA 

20 50:6750). The chromosomal sequences included in the vector can occur either as a 

single segment in the vector, which results in the integration of the entire vector, or two 
segments homologous to adjacent segments in the chromosome and flanking the 
expression construct in the vector, which can result in the stable integration of only the ,. . 
expression construct. 

25 Usually, extrachromosomal and integrating expression constructs may contain 

selectable markers to allow for the selection of yeast strains that have been transformed. 
Selectable markers may include biosynthetic genes that can be expressed in the yeast 
host, such as ADE2, HIS4, LEU2, TRP1, and ALG7, and the G418 resistance gene, 
which confer resistance in yeast cells to tunicamycin and G418, respectively. In 

30 addition, a suitable selectable marker may also provide yeast with the ability to grow in 
the presence of toxic compounds, such as metal. For example, the presence of CUP 1 
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allows yeast to grow in the presence of copper ions (Butt et al (1987) Microbiol Rev. 
57:351). 

Alternatively, some of the above described components can be put together into 
transformation vectors. Transformation vectors are usually comprised of a selectable 
5 marker that is either maintained in a replicon or developed into an integrating vector, as 
described above. 

Expression and transformation vectors, either extrachromosomal replicons or 
integrating vectors, have been developed for transformation into many yeasts. For 
example, expression vectors have been developed for, inter alia y the following yeasts: 

10 Candida albicans (Kurtz, et al. (1986) Mol Cell Biol 6:142), Candida maltosa 

(Kunze, etal (1985)*/. Basic Microbiol 25:141). Hansenula polymorpha (Gleeson, 
etal. (1986)/. Gen. Microbiol J 32:3459] Roggenkamp et al (1986) Mol Gen. 
Genet. 202:302), Kluyveromyces fragilis (Das, et al (1984)7. Bacteriol 755:1165), 
Kluyveromyces lactis (De Louvencourt et al (1983)7. Bacteriol 154:737; Van den 

15 Berg et al (1990)5/o/recA/zo/o^5:135),Pichiaguillerimondii (Kunze etal (1985) 
J. Basic Microbiol 25:141), Pichia pastoris (Cregg, et al (1985) Mol Cell Biol 
5:3376; US Patent Nos. 4,837,148 and 4,929,555), Saccharomyces cerevisiae (Hitmen 
etal (1978) Proc. Natl Acad. Sci. USA 75:1929; Ito et al (1983) J. Bacteriol 
755:163), Schizosaccharomyces pombe (Beach and Nurse (1981) Nature 300:706), and 

20 Yarrowia lipolytica (Davidow, et al (1985) Curr. Genet. 70:380471 Gaillardin, et al 
(1985) Curr. Genet. 70:49). 

Methods of introducing exogenous DNA into yeast hosts are well-known in the 
art, and usually include either the transformation of spheroplasts or of intact yeast cells 
treated with alkali cations. Transformation procedures usually vary with the yeast 

25 species to be transformed. (See e.g., Kurtz et al (1986) Mol. Cell Biol 6:142; 
Kunze et al (1985)7. Basic Microbiol 25:141; Candida; Gleeson et al (1986) J. 
Gen. Microbiol 752:3459; Roggenkamp et al (1986) Mol Gen. Genet. 202:302; 
Hansenula; Das et al (1984)/. Bacteriol 755:1165; De Louvencourt et al. (1983) J. 
Bacteriol 154:1 165; Van den Berg et al (1990) Bio/Technology 5:135; 

30 Kluyveromyces; Cregg et al. (1985) Mol Cell Biol 5:3376; Kunze et al (1985)7. 
Basic Microbiol 25:141; US Patent Nos. 4,837,148 and 4,929,555; Pichia; Hinnen et 
al (1978) Proc. Natl Acad. Sci. USA 75; 1929; Ito et al (1983) J. Bacteriol 
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753:163 Saccharomyces; Beach and Nurse (1981) Nature 300:706; 
Schizosaccharomyces; Davidow et al (1985) Curr. Genet 70:39; Gaillardin et al 
(1985) Curr. Genet 70:49; Yarrowia). 

Bacterial expression techniques are known in the art. A bacterial promoter is 
5 any DNA sequence capable of binding bacterial RNA polymerase and initiating the 
downstream (3') transcription of a coding sequence (e.g., structural gene) into mRNA. 
A promoter will have a transcription initiation region which is usually placed proximal 
to the 5' end of the coding sequence. This transcription initiation region usually 
includes an RNA polymerase binding site and a transcription initiation site. A bacterial 

10 promoter may also have a second domain called an operator, that may overlap an 

adjacent RNA polymerase binding site at which RNA synthesis begins. The operator 
permits negative regulated (inducible) transcription, as a gene repressor protein may 
bind the operator and thereby inhibit transcription of a specific gene. Constitutive 
expression may occur in the absence of negative regulatory elements, such as the 

1 5 operator. In addition, positive regulation may be achieved by a gene activator protein 
binding sequence, which, if present is usually proximal (5') to the RNA polymerase 
binding sequence. An example of a gene activator protein is the catabolite activator 
protein (CAP), which helps initiate transcription of the lac operon in Escherichia coli 
(E. coli)(Raibaudefa/. (1984) Annu. Rev. Genet 75:173). Regulated expression 

20 may therefore be either positive or negative, thereby either enhancing or reducing 
transcription. 

Expression and transformation vectors, either extra-chromosomal replicons or 
integrating vectors, have been developed for transformation into many bacteria. For 
example, expression vectors have been developed for, inter alia, the following bacteria: 

25 Bacillus subtilis (Palva al (1982) Proc. Natl. Acad. Set USA 79:5582; EP-A-0 
036 259 and EP-A-0 063 953; WO 84/04541), Escherichia coli (Shimatake et al 
(19&1) Nature 292:12S; Amanne/a/. (1985) Gene 40:183; Studier et al (1986)7. 
Mol Biol 189:113; EP-A-0 036 776,EP-A-0 136 829 and EP-A-0 136 907), 
Streptococcus cremoris (Powell et al (\9%%)Appl Environ. Microbiol 54:655); 

30 Streptococcus lividans (Powell et al (\9%%)Appl Environ. Microbiol 54:655), 
Streptomyces lividans (US patent 4,745,056). 
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Methods of introducing exogenous DNA into bacterial hosts are well-known in 
the art, and usually include either the transformation of bacteria treated with CaCl 2 or 
other agents, such as divalent cations and DMSO. DNA can also be introduced into 
bacterial cells by electroporation. Transformation procedures usually vary with the 
5 bacterial species to be transformed. (See e.g., Masson et al (1989) FEMS Microbiol 
Lett. 60:273; Palvae* al (1982) Proc. Natl Acad. ScL USA 79:5582; EP-A-0 036 
259 and EP-A-0 063 953; WO 84/04541, Bacillus, Miller et al (1988) Proc. Natl 
Acad. ScL 55:856; Wang et al. (1990)7. Bacteriol 172:949; Campylobacter, Cohen 
etal (1973) Proa Natl Acad ScL 69:2110; Dower etal (1988) Nucleic Acids Res. 

10 76:6127; Kushner (1978) "An improved method for transformation of Escherichia coli 
with Co 1E1 -derived plasmids. In Genetic Engineering: Proceedings of the 
International Symposium on Genetic Engineering (eds. H.W. Boyer and S. Nicosia); 
MandeU/a/. (1970) J. Mol Biol 55:159; Taketo (1988) Biochim. Biophys. Acta 
949:318; Escherichia; Chassy etal. (1987) FEMS Microbiol Lett. 44:113 

15 Lactobacillus; Fiedler et al (1988) Anal Biochem 1 70:38, Pseudomonas; Augustin et 
al (1990) FEMS Microbiol Lett 66:203, Staphylococcus, Barany et al (1980)/. 
Bacteriol. 144:698; Harlander (1987) "Transformation of Streptococcus lactis by 
electroporation, in: Streptococcal Genetics (ed. J. Ferretti and R. Curtiss III); Perry et 
al (1981) Infect. Immun. 52:1295; Powell et al (1988) Appl Environ. Microbiol 

20 54:655; Somkuti al (1987) Proc. 4th Evr. Cong. Biotechnology 7:412, 
Streptococcus). 

In addition, viral antigens can be expressed in insect cells by the Baculovirus 
system. A general guide to Baculovirus expression by Summer and Smith is A Manual 
of Methods for Baculovirus Vectors and Insect Cell Culture Procedures (Texas 

25 Agricultural Experiment Station Bulletin No. 1555). To incorporate the heterologous 
gene into the Baculovirus genome the gene is first cloned into a transfer vector 
containing some Baculovirus sequences. This transfer vector, when it is cotransfected 
with wild-type virus into insect cells, will recombine with the wild-type virus. Usually, 
the transfer vector will be engineered so that the heterologous gene will disrupt the 

30 wild-type Baculovirus polyhedron gene. This disruption enables easy selection of the 
recombinant virus since the cells infected with the recombinant virus will appear 
phenotypically different from the cells infected with the wild-type virus. The purified 
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recombinant virus can be used to infect cells to express the heterologous gene. The 
foreign protein can be secreted into the medium if a signal peptide is linked in frame to 
the heterologous gene; otherwise, the protein will be bound in the cell lysates. For 
further information, see Smith et al Mol. & Cell. Biol. 3:2156-2165 (1983) or Luckow 
5 and Summers in Virology 17: 31-39 (1989). 

Baculovirus expression can also be affected in plant cells. There are many plant 
cell culture and whole plant genetic expression systems known in the art. Exemplary 
plant cellular genetic expression systems include those described in patents, such as: 
US 5,693,506; US 5,659,122; and US 5,608,143. Additional examples of genetic 

10 expression in plant cell culture has been described by Zenk, Phytochemistry 30:3861- 
3863 (1991). Descriptions of plant protein signal peptides may be found in addition to 
the references described above in Vaulcombe et al., Mol Gen. Genet. 209:33-40 
(1987); Chandler et al., Plant Molecular Biology 3:407-418 (1984); Rogers, J. Biol. 
Chem. 260:3731-3738 (1985); Rothstein et al., Gene 55:353-356 (1987); Whittier et 

15 al., Nucleic Acids Research 15:251 5-2535 (1 987); Wirsel et al., Molecular 

Microbiology 3:3-14 (1989); Yu et al., Gene 122:247-253 (1992). A description of the 
regulation of plant gene expression by the phytohormone, gibberellic acid and secreted 
enzymes induced by gibberellic acid can be found in R.L. Jones and J. MacMillin, 
Gibberellins: in: Advanced Plant Physiology,. Malcolm B. Wilkins, ed., 1984 Pitman 

20 Publishing Limited, London, pp. 21-52. References that describe other metabolically- 
regulated genes: Sheen, Plant Cell, 2: 1027-1038(1990); Maas et al., EMBO J. 9:3447- 
3452 (1990); Benkel and Hickey, Proc. Natl Acad. ScL 84:1337-1339(1987). 

All plants from which protoplasts can be isolated and cultured to give whole 

regenerated plants can be transformed by the present invention so that whole plants are 

25 recovered which contain the transferred gene. It is known that practically all plants can 
be regenerated from cultured cells or tissues, including but not limited to all major 
species of sugarcane, sugar beet, cotton, fruit and other trees, legumes and vegetables. 
Some suitable plants include, for example, species from the genera Fragaria, Lotus, 
Medicago, Onobrychis, Trifolium, Trigonella, Vigna, Citrus, Linum, Geranium, 

30 Manihot, Daucus, Arabidopsis, Brassica, Raphanus, Sinapis, Atropa, Capsicum, 

Datura, Hyoscyamus, Lycopersion, Nicotiana, Solanum, Petunia, Digitalis, Majorana, 
Cichorium, Helianthus, Lactuca, Bromus, Asparagus, Antirrhinum, Hererocallis, 
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Nemesia, Pelargonium, Panicum, Pennisetum, Ranunculus, Senecio, Salpiglossis, 
Cucumis, Browaalia, Glycine, Lolium, Zea, Triticum, Sorghum, and Datura. 

Transformation can be by any method for introducing polynucleotides into a 
host cell, including, for example packaging the polynucleotide in a virus and 
5 transducing a host cell with the virus, and by direct uptake of the polynucleotide. The 
transformation procedure used depends upon the host to be transformed. Bacterial 
transformation by direct uptake generally employs treatment with calcium or rubidium 
chloride (Cohen (1972), Proc. Natl. Acad. Sci. U.S.A. 69:21 10; Maniatis et al. (1982), 
MOLECULAR CLONING; A LABORATORY MANUAL (Cold Spring Harbor Press, 

10 Cold Spring Harbor, N.Y.). Yeast transformation by direct uptake may be carried out 
using the method of Hinnen et al. (1978) Proc. Natl. Acad. Sci. U.S.A. 75: 1929. 
Mammalian transformations by direct uptake may be conducted using the calcium 
phosphate precipitation method of Graham and Van der Eb (1978), Virology 52:546 or 
the various known modifications thereof. 

15 Vector construction employs techniques which are known in the art. Site- 

specific DNA cleavage is performed by treating with suitable restriction enzymes under 
conditions which generally are specified by the manufacturer of these commercially 
available enzymes. The cleaved fragments may be separated using polyacrylamide or 
agarose gel electrophoresis techniques, according to the general procedures found in 

20 Methods in Enzymology (1980) 65:499-560. Sticky ended cleavage fragments may be 
blunt ended using E. coli DNA polymerase I (Klenow) in the presence of the 
appropriate deoxynucleotide triphosphates (dNTPs) present in the mixture. Treatment 
with SI nuclease may also be used, resulting in the hydrolysis of any single stranded 
DNA portions. 

25 Ligations are carried out using standard buffer and temperature conditions using 

T4 DNA ligase and ATP; sticky end ligations require less ATP and less ligase than 
blunt end ligations. When vector fragments are used as part of a ligation mixture, the 
vector fragment is often treated with bacterial alkaline phosphatase (BAP) or calf 
intestinal alkaline phosphatase to remove the 5'-phosphate and thus prevent religation 

30 of the vector; alternatively, restriction enzyme digestion of unwanted fragments can be 
used to prevent ligation. Ligation mixtures are transformed into suitable cloning hosts, 
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such as E. coli, and successful transformants selected by, for example, antibiotic 
resistance, and screened for the correct construction. 

Synthetic oligonucleotides may be prepared using an automated oligonucleotide 
synthesizer as described by Warner (1984), DNA 3:401. If desired, the synthetic strands 
5 may be labeled with 32 P by treatment with polynucleotide kinase in the presence of 32 P- 
ATP, using standard conditions for the reaction. DNA sequences, including those 
isolated from cDNA libraries, may be modified by known techniques, including, for 
example site directed mutagenesis, as described by Zoller (1982), Nucleic Acids Res. 
10:6487. 

1 0 The expression constructs of the present invention, including the desired fusion, 

or individual expression constructs comprising the individual components of these 
fusions, may be used for nucleic acid immunization, to activate HCV-specific T cells, 
using standard gene delivery protocols. Methods for gene delivery are known in the 
art. See, e.g., U.S. Patent Nos. 5,399,346, 5,580,859, 5,589,466. Genes can be 

15 delivered either directly to the vertebrate subject or, alternatively, delivered ex vivo, to 
cells derived from the subject and the cells reimplanted in the subject. For example, the 
constructs can be delivered as plasmid DNA, e.g., contained within a plasmid, such as 
pBR322, pUC, orColEl 

Additionally, the expression constructs can be packaged in liposomes prior to 

20 delivery to the cells. Lipid encapsulation is generally accomplished using liposomes 
which are able to stably bind or entrap and retain nucleic acid. The ratio of condensed 
DNA to lipid preparation can vary but will generally be around 1:1 (mg 
DNA:micromoles lipid), or more of lipid. For a review of the use of liposomes as 
carriers for delivery of nucleic acids, see, Hug and Sleight, Biochim. Biophys. Acta. 

25 (1991) 1097:1-17: Straubinger et al., in Methods of Enzymology (1983), Vol. 101, pp. 
512-527. 

Liposomal preparations for use with the present invention include cationic 
(positively charged), anionic (negatively charged) and neutral preparations, with 
cationic liposomes particularly preferred. Cationic liposomes are readily available. For 
30 example, N[ 1 -2,3-dioleyloxy)propyl]-N,N,N-triethylammonium (DOTMA) liposomes 
are available under the trademark Lipofectin, from GIBCO BRL, Grand Island, NY. 
(See, also, Feigner et al., Proa Natl Acad. Sci. USA (1987) 84:7413-7416). Other 



.41- 



WO 01/38360 



PCT/US00/32326 



commercially available lipids include transfectace (DDAB/DOPE) and DOTAP/DOPE 
(Boerhinger). Other cationic liposomes can be prepared from readily available 
materials using techniques well known in the art. See, e.g., Szoka et al., Proc. Natl. 
Acad. Sci. USA (1978) 25:4194-4198; PCT Publication No. WO 90/1 1092 for a 
5 description of the synthesis of DOTAP (1 ,2-bis(oleoyloxy)-3- 

(trimethylammonio)propane) liposomes. The various liposome-nucleic acid complexes 
are prepared using methods known in the art. See, e.g., Straubinger et al., in 
METHODS OF IMMUNOLOGY (1983), Vol. 101, pp. 512-527; Szoka et al., Proc. 
Natl. Acad. Sci. USA (1978) 75:4194-4198; Papahadjopoulos et al, Biochim. Biophys. 

10 Acta (1975) 394:483; Wilson et al., Cell (1979) 17:77); Deamer and Bangham, 

Biochim. Biophys. Acta (1976) 443:629; Ostro et al., Biochem. Biophys. Res. Commun. 
(1977) 76:836; Fraley et al., Proc. Natl. Acad. Sci. USA (1979) 76:3348); Enoch and 
Strittmatter, Proc. Natl Acad. Sci. USA (1979) 76:145); Fraley et b1.,J. Biol. Chem. 
(1980) 255:10431; Szoka and Papahadjopoulos, Proc. Natl Acad. Sci. USA (1978) 

15 75:145; and Schaefer-Ridder et al, Science (1982) 215:166. 

The DNA can also be delivered in cochleate lipid compositions similar to those 
described by Papahadjopoulos et al., Biochem. Biophys. Acta. (1975) 394:483-491. 
See, also, U.S. Patent Nos. 4,663,161 and 4,871,488. 

A number of viral based systems have been developed for gene transfer into 

20 mammalian cells. For example, retroviruses provide a convenient platform for gene 
delivery systems, such as murine sarcoma virus, mouse mammary tumor virus, 
Moloney murine leukemia virus, and Rous sarcoma virus. A selected gene can be 
inserted into a vector and packaged in retroviral particles using techniques known in the 
art. The recombinant virus can then be isolated and delivered to cells of the subject 

25 either in vivo or ex vivo. A number of retroviral systems have been described (U.S. 
Patent No. 5,219,740; Miller and Rosman, BioTechniques (1989) 7:980-990; Miller, 
A.D., Human Gene Therapy (1990) 1:5-14; Scarpa et al., Virology (1991) 180:849-852; 
Burns et al., Proc. Natl Acad. Sci. USA (1993) 90:8033-8037; and Boris-Lawrie and 
Temin, Cur. Opin. Genet. Develop. (1993) 3:102-109. Briefly, retroviral gene delivery 

30 vehicles of the present invention may be readily constructed from a wide variety of 
retroviruses, including for example, B, C, and D type retroviruses as well as 
spumaviruses and lentiviruses such as FIV, HTV, HIV-1, HIV-2 and SIV (see RNA 
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Tumor Viruses, Second Edition, Cold Spring Harbor Laboratory, 1985). Such 
retroviruses may be readily obtained from depositories or collections such as the 
American Type Culture Collection ("ATCC"; 10801 University Blvd., Manassas, VA 
201 10-2209), or isolated from known sources using commonly available techniques. 
5 A number of adenovirus vectors have also been described, such as adenovirus 

Type 2 and Type 5 vectors. Unlike retroviruses which integrate into the host genome, 
adenoviruses persist extrachromosomally thus minimizing the risks associated with 
insertional mutagenesis (Haj -Ahmad and Graham, J. Virol (1986) 57:267-274; Bett et 
al., J. Virol (1993) 67:591 1-5921; Mittereder et al, Human Gene Therapy (1994) 

10 5:717-729; Seth et al., J. Virol (1994) 68:933-940; Barr et al, Gene Therapy (1994) 
1:51-58; Berkner, K.L. BioTechniques (1988) 6:616-629; and Rich et al., Human Gene 
Therapy (1993) 4:461-476). 

Molecular conjugate vectors, such as the adenovirus chimeric vectors described 
in Michael et al., Biol Chem. (1993) 268:6866-6869 and Wagner et al., Proc. Natl 

15 Acad. Sci. USA (1992) 89:6099-6103, can also be used for gene delivery. 

Members of the Alphavirus genus, such as but not limited to vectors derived 
from the Sindbis and Semliki Forest viruses, VEE, will also find use as viral vectors for 
delivering the gene of interest. For a description of Sindbis- virus derived vectors useful 
for the practice of the instant methods, see, Dubensky et al., J. Virol (1996) 70:508- 

20 519; and International Publication Nos. WO 95/07995 and WO 96/1 7072. 

Other vectors can be used, including but not limited to simian virus 40, 
cytomegalovirus. Bacterial vectors, such as Salmonella ssp. Yersinia enterocolitica, 
Shigella spp., Vibrio cholerae y Mycobacterium strain BCG, and Listeria 
monocytogenes can be used. Minichromosomes such as MC and MCI, bacteriophages, 

25 cosmids (plasmids into which phage lambda cos sites have been inserted) and replicons 
(genetic elements that are capable of replication under their own control in a cell) can 
also be used. 

The expression constructs may also be encapsulated, adsorbed to, or associated 
with, particulate carriers. Such carriers present multiple copies of a selected molecule 
30 to the immune system and promote trapping and retention of molecules in local lymph 
nodes. The particles can be phagocytosed by macrophages and can enhance antigen 
presentation through cytokine release. Examples of particulate carriers include those 
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derived from polymethyl methacrylate polymers, as well as microparticles derived from 
poly(lactides) and poly(lactide-co-glycolides), known as PLG. See, e.g., Jeffery et al., 
Pharm. Res. (1993) 10:362-368; and McGee et al, J. Microencap. (1996). 

A wide variety of other methods can be used to deliver the expression 
5 constructs to cells. Such methods include DEAE dextran-mediated transfection, 

calcium phosphate precipitation, polylysine- or polyomithine-mediated transfection, or 
precipitation using other insoluble inorganic salts, such as strontium phosphate, 
aluminum silicates including bentonite and kaolin, chromic oxide, magnesium silicate, 
talc, and the like. Other useful methods of transfection include electroporation, 

10 sonoporation, protoplast fusion, liposomes, peptoid delivery, or microinjection. See, 
e.g., Sambrook et al., supra, for a discussion of techniques for transforming cells of 
interest; and Feigner, P.L., Advanced Drug Delivery Reviews (1990) 5:163-187, for a 
review of delivery systems useful for gene transfer. One particularly effective method 
of delivering DNA using electroporation is described in International Publication No. 

15 WO/0045823. 

Additionally, biolistic delivery systems employing particulate carriers such as 
gold and tungsten, are especially useful for delivering the expression constructs of the 
present invention. The particles are coated with the construct to be delivered and 
accelerated to high velocity, generally under a reduced atmosphere, using a gun powder 

20 discharge from a "gene gun." For a description of such techniques, and apparatuses 
useful therefore, see, e.g., U.S. Patent Nos. 4,945,050; 5,036,006; 5,100,792; 
5,179,022; 5,371,015; and 5,478,744. 

Compositions 

25 The invention also provides compositions comprising the HCV polypeptides or 

polynucleotides described herein. Such compositions are useful as diagnostics, for 
example, using the mutant polypeptides (or polynucleotides encoding these 
polypeptides) in diagnostic reagents. Diagnostics using polypeptides and 
polynucleotides are known to those of skill in the art. 

30 In addition, immunogenic compounds can be prepared from one or more 

immunogenic polypeptides derived from the polypeptides described herein, for 
example the ANS35 polypeptide. The preparation of immunogenic compounds which 
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contain immunogenic polypeptide(s) as active ingredients is known to one skilled in the 
art. Typically, such immunogenic compounds are prepared as injectables, either as 
liquid solutions or suspensions; solid forms suitable for solution in, or suspension in, 
liquid prior to injection can also be prepared. The preparation can also be emulsified, or 
5 the protein encapsulated in liposomes. 

Immunogenic and diagnostic compositions of the invention preferably comprise 
a pharmaceutically acceptable carrier. The carrier should not itself induce the 
production of antibodies harmful to the host. Pharmaceutically acceptable carriers are 
well known to those in the art. Such carriers include, but are not limited to, large, 

10 slowly metabolized, macromolecules, such as proteins, polysaccharides such as latex 
functionalized sepharose, agarose, cellulose, cellulose beads and the like, polylactic 
acids, polyglycolic acids, polymeric amino acids such as polyglutamic acid, polylysine, 
and the like, amino acid copolymers, and inactive virus particles. 

Pharmaceutically acceptable salts can also be used in compositions of the 

15 invention, for example, mineral salts such as hydrochlorides, hydrobromides, 

phosphates, or sulfates, as well as salts of organic acids such as acetates, proprionates, 
malonates, or benzoates. Especially useful protein substrates are serum albumins, 
keyhole limpet hemocyanin, immunoglobulin molecules, thyroglobulin, ovalbumin, 
tetanus toxoid, and other proteins well known to those of skill in the art. Compositions 

20 of the invention can also contain liquids or excipients, such as water, saline, glycerol, 
dextrose, ethanol, or the like, singly or in combination, as well as substances such as 
wetting agents, emulsifying agents, or pH buffering agents. Liposomes can also be 
used as a carrier for a composition of the invention, such liposomes are described 
above. 

25 If desired, co-stimulatory molecules which improve immunogen presentation to 

lymphocytes, such as B7-1 or B7-2, or cytokines such as GM-CSF, IL-2, and IL-12, 
can be included in a composition of the invention. Optionally, adjuvants can also be 
included in a composition. Adjuvants which can be used include, but are not limited to: 
(1) aluminum salts (alum), such as aluminum hydroxide, aluminum phosphate, 

30 aluminum sulfate, etc; (2) oil-in-water emulsion formulations (with or without other 
specific immunostimulating agents such as muramyl peptides (see below) or bacterial 
cell wall components), such as for example (a) MF59 (PCT Publ. No. WO 90/14837), 
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containing 5% Squalene, 0.5% Tween 80, and 0.5% Span 85 (optionally containing 
various amounts of MTP-PE ), formulated into submicron particles using a 
microfluidizer such as Model HOY microfluidizer (Microfluidics, Newton, MA), 
(b) SAF, containing 10% Squalane, 0.4% Tween 80, 5% pluronic-blocked polymer 
5 LI 21 , and thr-MDP (see below) either microfluidized into a submicron emulsion or 
vortexed to generate a larger particle size emulsion, and (c) Ribi™ adjuvant system 
(RAS), (Ribi Immunochem, Hamilton, MT) containing 2% Squalene, 0.2% Tween 80, 
and one or more bacterial cell wall components from the group consisting of 
monophosphorylipid A (MPL), trehalose dimycolate (TDM), and cell wall skeleton 

10 (CWS), preferably MPL + CWS (Detox™); (3) saponin adjuvants, such as Stimulon™ 
(Cambridge Bioscience, Worcester, MA) may be used or particles generated therefrom 
such as ISCOMs (immunostimulating complexes); (4) Complete Freund's Adjuvant 
(CFA) and Incomplete Freund's Adjuvant (IFA); (5) cytokines, such as interleukins 
(e.g., IL-1, IL-2, IL-4, IL-5, IL-6, IL-7, IL-12, etc.), interferons (e.g., gamma 

1 5 interferon), macrophage colony stimulating factor (M-CSF), tumor necrosis factor 
(TNF), etc; (6) detoxified mutants of a bacterial ADP-ribosylating toxin such as a 
cholera toxin (CT), a pertussis toxin (PT), or an E. coli heat-labile toxin (LT), 
particularly LT-K63, LT-R72, CT-S109, PT-K9/G129; see, e.g., WO 93/13302 and 
WO 92/19265; (7) other substances that act as immunostimulating agents to enhance 

20 the effectiveness of the composition; and (8) microparticles with adsorbed 

macromolecules, as described in copending U.S. Patent Application Serial No. 
09/285,855 (filed April 2, 1999) and international Patent Application Serial No. 
PCT/US99/17308 (filed July 29, 1999). Alum and MF59 are preferred. The 
effectiveness of an adjuvant can be determined by measuring the amount of antibodies 

25 directed against an immunogenic polypeptide containing an HCV antigenic sequence 
resulting from administration of this polypeptide in immunogenic compounds which 
are also comprised of the various adjuvants. 

As mentioned above, muramyl peptides include, but are not limited to, N- 
acetyl-muramyl-L-threonyl-D-isoglutamine (thr-MDP), -acetyl-normuramyl-L-alanyl- 

30 D-isoglutamine (CGP 1 1637, referred to nor-MDP), N-acetylmuramyl-L-alanyl-D- 
isoglutaminyl-L-alanine-2-(1^2'-dipalmitoyL^^ 
ethylamine (CGP 19835A, referred to as MTP-PE), eta 
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Thus, such recombinant or synthetic HCV polypeptides can be used in vaccines 
and as diagnostics. Further, antibodies raised against these polypeptides can also be 
used as diagnostics, or for passive immunotherapy. In addition, antibodies to these 
polypeptides are useful for isolating and identifying HCV particles. 
5 Native HCV antigens can also be isolated from HCV virions. The virions can be 

grown in HCV infected cells in tissue culture, or in an infected host. 

Administration and Delivery 

The polynucleotide and polypeptide compositions described herein {e.g., 
10 immunogenic compounds) may be administered to a subject using any suitable delivery 
means. Methods of delivering nucleic acids into host cells are discussed above. 
Further, HCV polynucleotides and/or polypeptides can be administered parenterally, by 
injection, usually, subcutaneously, intramuscularly, transdermally or transcutaneously. 
Certain adjuvants, e.g. LTK63, LTR72 or PLG formulations, can be administered 
1 5 intranasally or orally. Additional formulations which are suitable for other modes of 
administration include suppositories. For suppositories, traditional binders and carriers 
can include, for example, polyalkylene glycols or triglycerides; such suppositories can 
be formed from mixtures containing the active ingredient in the range of 0.5% to 10%, 
preferably l%-2%. Other oral formulations include such normally employed excipients 
20 as, for example, pharmaceutical grades of mannitol, lactose, starch, magnesium 
stearate, sodium saccharine, cellulose, magnesium carbonate, and the like. These 
compositions take the form of solutions, suspensions, tablets, pills, capsules, sustained 
release formulations or powders and contain 10%-95% of active ingredient, preferably 
25%-70%. 

25 The polypeptides of the present invention can be formulated into the 

immunogenic compound as neutral or salt forms. Pharmaceutically acceptable salts 
include the acid addition salts (formed with free amino groups of the peptide) and 
which are formed with inorganic acids such as, for example, hydrochloric or 
phosphoric acids, or such organic acids such as acetic, oxalic, tartaric, maleic, and the 

30 like. Salts formed with the free carboxyl groups can also be derived from inorganic 
bases such as, for example, sodium, potassium, ammonium, calcium, or ferric 
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hydroxides, and such organic bases as isopropylamine, trimethylamine, 2-ethylamino 
ethanol, histidine, procaine, and the like. 

The immunogenic compounds are administered in a manner compatible with the 
dosage formulation, and in such amount as will be prophylactically and/or 
5 therapeutically effective. The quantity to be administered, which is generally in the 
range of 5 micrograms to 250 micrograms of polypeptide per dose, depends on the 
subject to be treated, capacity of the subject's immune system to synthesize antibodies, 
and the degree of protection desired. Precise amounts of active ingredient required to be 
administered may depend on the judgment of the practitioner and can be peculiar to 

10 each subject. 

The immunogenic compound can be given in a single dose schedule, or 
preferably in a multiple dose schedule. A multiple dose schedule is one in which a 
primary course of vaccination can be with 1-10 separate doses, followed by other doses 
given at subsequent time intervals required to maintain and or reenforce the immune 

15 response, for example, at 1-4 months for a second dose, and if needed, a subsequent 
dose(s) after several months. Further, the course of administration may include 
polynucleotides and polypeptides, together or sequentially (for example, priming with a 
polynucleotide composition and boosting with a polypeptide composition). The dosage 
regimen will also, at least in part, be determined by the need of the individual and be 

20 dependent upon the judgment of the practitioner. 

In certain embodiments, administration of the polynucleotides and polypeptides 
described herein is used to activate T cells. In addition to the practical advantages of 
simplicity of construction and modification, administration of polynucleotides encoding 
mutant NS polypeptides results in the synthesis of a mutant NS polypeptide in the host. 

25 Thus, these immunogens are presented to the host immune system with native post- 
radiational modifications, structure, and conformation. The polynucleotides are 
preferably injected intramuscularly to a large mammal, such as a human, at a dose of 
0.5, 0.75, 1.0, 1.5, 2.0, 2.5, 5 or 10 mg/kg. 

The proteins and/or polynucleotides can be administered either to a mammal 

30 which is not infected with an HCV or can be administered to an HCV-infected 
mammal. The particular dosages of the polynucleotides or fiision proteins in a 
composition or will depend on many factors including, but not limited to the species, 
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age, and general condition of the mammal to which the composition is administered, 
and the mode of administration of the composition. An effective amount of the 
composition of the invention can be readily determined using only routine 
experimentation. In vitro and in vivo models can be employed to identify appropriate 
5 doses. Generally, 0.5, 0.75, 1.0, 1.5, 2.0, 2.5, 5 or 10 mg will be administered to a large 
mammal, such as a baboon, chimpanzee, or human. If desired, co-stimulatory 
molecules or adjuvants can also be provided before, after, or together with the 
compositions. 

10 Antibodies and Diagnostics 

Antibodies, both monoclonal and polyclonal, which are directed against HCV 
epitopes are particularly useful in diagnosis, and those which are neutralizing are useful 
in passive immunotherapy. Monoclonal antibodies, in particular, may be used to raise 
anti-idiotype antibodies. 

1 5 Anti-idiotype antibodies are immunoglobulins which carry an "internal image" 

of the antigen of the infectious agent against which protection is desired. Techniques 
for raising anti-idiotype antibodies are known in the art. See, e.g., Grzych (1985), 
Nature 316:74; MacNamara et al. (1984), Science 226:1325, Uytdehaag et al (1985), J. 
Immunol. 134: 1225. These anti-idiotype antibodies may also be useful for treatment 

20 and/or diagnosis of NANBH, as well as for an elucidation of the immunogenic regions 
of HCV antigens. 

An immunoassay for viral antigen may use, for example, a monoclonal antibody 

directed towards a viral epitope, a combination of monoclonal antibodies directed—. 

towards epitopes of one viral polypeptide, monoclonal antibodies directed towards 
25 epitopes of different viral polypeptides, polyclonal antibodies directed towards the 

same viral antigen, polyclonal antibodies directed towards different viral antigens or a 

combination of monoclonal and polyclonal antibodies. 

Immunoassay protocols may be based, for example, upon competition, or direct 

reaction, or sandwich type assays. Protocols may also, for example, use solid supports, 
30 or may be by immunoprecipitation. Most assays involve the use of labeled antibody or 

polypeptide. The labels may be, for example, fluorescent, chemiluminescent, 

radioactive, or dye molecules. Assays which amplify the signals from the probe are also 
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known. Examples of which are assays which utilize biotin and avidin, and enzyme- 
labeled and mediated immunoassays, such as ELISA assays. 

An enzyme-linked immunosorbent assay (ELISA) can be used to measure either 
antigen or antibody concentrations. This method depends upon conjugation of an 
5 enzyme to either an antigen or an antibody, and uses the bound enzyme activity as a 
quantitative label. To measure antibody, the known antigen is fixed to a solid phase 
(e.g., a microplate or plastic cup), incubated with test serum dilutions, washed, 
incubated with anti-immunoglobulin labeled with an enzyme, and washed again. 
Enzymes suitable for labeling are known in the art, and include, for example, 
10 horseradish peroxidase. Enzyme activity bound to the solid phase is measured by 

adding the specific substrate, and determining product formation or substrate utilization 
colorimetrically. The enzyme activity bound is a direct function of the amount of 
antibody bound. 

To measure antigen, a known specific antibody is fixed to the solid phase, the 
15 test material containing antigen is added, after an incubation the solid phase is washed, 
and a second enzyme-labeled antibody is added. After washing, substrate is added, and 
enzyme activity is estimated colorimetrically, and related to antigen concentration. 

The HCV fusion proteins, such as NS3 mutant and core fusion proteins, can 
also be used to produce HCV-specific polyclonal and monoclonal antibodies. HCV- 
20 specific polyclonal and monoclonal antibodies specifically bind to HCV antigens. 

Polyclonal antibodies can be produced by administering the fusion protein to a 
mammal, such as a mouse, a rabbit, a goat, or a horse. Serum from the immunized 
animal is collected and the antibodies are purified from the plasma by, for example, 
precipitation with ammonium sulfate, followed by chromatography, preferably affinity 
25 chromatography. Techniques for producing and processing polyclonal antisera are 
known in the art. 

Monoclonal antibodies directed against HCV-specific epitopes present in the 
fusion proteins can also be readily produced. Normal B cells from a mammal, such as a 
mouse, immunized with, e.g., a mutant NS3 polypeptide or NS-core fusion protein can 
30 be fused with, for example, HAT-sensitive mouse myeloma cells to produce 

hybridomas. Hybridomas producing HCV-specific antibodies can be identified using 
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RIA or ELIS A and isolated by cloning in semi-solid agar or by limiting dilution. 
Clones producing HCV-specific antibodies are isolated by another round of screening. 

Antibodies, either monoclonal and polyclonal, which are directed against HCV 
epitopes, are particularly useful for detecting the presence of HCV or HCV antigens in 
5 a sample, such as a serum sample from an HCV-infected human. An immunoassay for 
an HCV antigen may utilize one antibody or several antibodies. An immunoassay for 
an HCV antigen may use, for example, a monoclonal antibody directed towards an 
HCV epitope, a combination of monoclonal antibodies directed towards epitopes of one 
HCV polypeptide, monoclonal antibodies directed towards epitopes of different HCV 

10 polypeptides, polyclonal antibodies directed towards the same HCV antigen, polyclonal 
antibodies directed towards different HCV antigens, or a combination of monoclonal 
and polyclonal antibodies. Immunoassay protocols may be based, for example, upon 
competition, direct reaction, or sandwich type assays using, for example, labeled 
antibody. The labels may be, for example, fluorescent, chemiluminescent, or 

1 5 radioactive. 

The polyclonal or monoclonal antibodies may further be used to isolate HCV 
particles or antigens by immunoaffinity columns. The antibodies can be affixed to a 
solid support by, for example, adsorption or by covalent linkage so that the antibodies 
retain their immunoselective activity. Optionally, spacer groups may be included so 
20 that the antigen binding site of the antibody remains accessible. The immobilized 

antibodies can then be used to bind HCV particles or antigens from a biological sample, 
such as blood or plasma. The bound HCV particles or antigens are recovered from the 
column matrix by, for example, a change in pH. 

25 Methods of Eliciting Immune Responses 

HCV-specific T cells that are activated by the above-described polypeptides, 
expressed in vivo or in vitro preferably recognize an epitope of an HCV polypeptide 
such as a mutant NS3 polypeptide, including an epitope of a mutant HCV polypeptide. 
HCV-specific T cells can be CD8 + or CD4 + . 
30 HCV-specific CD8 + T cells preferably are cytotoxic T lymphocytes (CTL) 

which can kill HCV-infected cells that display NS3, NS4, NS5a, NS5b epitopes 
complexed with an MHC class I molecule. HCV-specific CD8 + T cells may also 
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express interferon-Y (IFN-y). HCV-specific CD8 + T cells can be detected by, for 
example, 51 Cr release assays. 51 Cr release assays measure the ability of HCV-specific 
CD8 + T cells to lyse target cells displaying an nonstructural (e.g., mutant NS) epitope. 
HCV-specific CD8 + T cells which express IFN-y can also be detected by 
5 immunological methods, preferably by intracellular staining for EFN-y after in vitro 
stimulation with a mutant NS polypeptide. 

HCV-specific CD4 + cells activated by the above-described polypeptides, 
expressed in vivo or in vitro >, and combinations of the individual components of these 
proteins, preferably recognize an epitope of a mutant non-structural polypeptide, 

10 including an epitope of a mutant protein, that is bound to an MHC class II molecule on 
an HCV-infected cell and proliferate in response to stimulating mutant peptides. 

HCV-specific CD4 + T cells can be detected by a lymphoproliferation assay. 
Lymphoproliferation assays measure the ability of HCV-specific CD4 + T cells to 
proliferate in response to an epitope. 

1 5 Mutant NS (or fusions thereof with core, envelope or other viral polypeptides) 

can be used to activate HCV-specific T cells either in vitro or in vivo. Activation of 
HCV-specific T cells can be used, inter alia, to provide model systems to optimize 
CTL responses to HCV and to provide prophylactic or therapeutic treatment against 
HCV infection. For in vitro activation, proteins are preferably supplied to T cells via a 

20 plasmid or a viral vector, such as an adenovirus vector, as described above. 

Polyclonal populations of T cells can be derived from the blood, and preferably 
from peripheral lymphoid organs, such as lymph nodes, spleen, or thymus, of mammals 
that have been infected witkan HCV. Preferred mammals include mice, chimpanzees, 
baboons, and humans. The HCV serves to expand the number of activated HCV- 

25 specific T cells in the mammal. The HCV-specific T cells derived from the mammal 
can then be restimulated in vitro by adding HCV epitopic peptides to the T cells. The 
HCV-specific T cells can then be tested for, inter alia, proliferation (e.g t . 
lymphoproliferation assays known in the art), the production of IFN-y, and the ability 
to lyse target cells displaying HCV NS epitopes in vitro. 

30 
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The following examples are meant to illustrate the invention and are not meant 
to limit it in any way. Those of ordinary skill in the art will recognize modifications 
within the spirit and scope of the invention as set forth herein. 

5 EXAMPLES 
Example 1: Constructs 

pCMV-II : pCMV-II (Figure 7, SEQ ID NO:5) was created to contain the human 
CMV promoter, enhancer, intron A, polylinker and the bovine growth hormone 

1 0 terminator in a deleted-pUC backbone (Life Technologies). 

pT7-HCV : pT7-HCV was created in a polylinker-modified pUC vector to 
contain full-length HCV cDNA preceded by a synthetic T7 promoter. pT7-HCV also 
contains the complete 5* UTR and the poly A version of the 3 f UTR. 

pCMV.ANS35 : To generate pCMV.ANS35 (Figure 5, SEQ ID NO:3), a two 

1 5 step procedure was undertaken. First, a PCR product was generated from pT7-HCV 
that corresponded to the following: a 5' EcoRI site, followed by the Kozak sequence of 
ACCATGG; the initiator ATG followed by amino acid #1242 and continuing to the 
StuI site. Second, the StuI to Xbal fragment from a full-length genomic clone was 
isolated. The genomic clone consisted of the T7 promoter fused to the full-length HCV 

20 cDNA with the poly A version of the 3 ? end, in a pUC vector. Finally, the EcoRI-StuI 
and Stul-Xbal fragments were ligated into the pCMV-II expression vector, transformed 
into HB101 competent cells and plated onto ampicillin (100 n.g/ml). Miniprep analyses 
led to the identification of the desired clone which was amplified on a larger scale using 
a Quigen Gigaprep kit following the manufacturer's specifications. The resulting clone 

25 was named pCMV.ANS35 (Figure 5, SEQ ID NO:3). 

pd.ANS3NS5 : As shown schematically in Figure 10, the yeast expression 
plasmid pd.ANS3NS5 (SEQ ID NO:8) was constructed using restriction fragments 
obtained from the mammalian expression plasmid pCMV.KM.ANS35. 
pCMV.KM.ANS35 is identical to pCMV.ANS35 (Figure 5, SEQ ID NO:3) except that 

30 it contains a kanamycin resistance gene in the viral backbone. pCMV.KM.ANS35 was 
digested with EcoRI and Nhel to obtain 2895bp EcoRI-Nhel fragment. EcoRI-Nhel 
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fragment was ligated into pRSET HindlH-Nhel subcloning vector with oligos (HE) 
from Hindlll to EcoRL After sequence verification, pRSETHindlll-Nhel #6 was 
digested with Hindin and Nhel to obtain a 2908bp Hindlll-Nhel fragment. 

pCMV.KM.ANS35 was linearized with Xbal and ligated with synthetic oligos 
5 (XS) from Xbal-SalL The ligation was digested with Nhel and Sail to obtain 2481 bp 
Nhel-Sall fragment. The fragment was ligated into pET3a Nhel-Sall subcloning 
vector. After sequence verification, pET3a Nhel-Sall #2 was digested with Nhel and 
Sail to obtain a 248 lbp Nhel-Sall fragment. BamHI-Hindlll ADH2/GAPDH promoter 
fragment was then ligated with Hindlll-Nhel and Nhel-Sall fragments into pBS24.1 

10 BamHI-Sall yeast expression vector. 

pd.ANS3NS5.PJ: pd.ANS3NS5.PJ (Figures 13 and 14; SEQ ID NO:10) was 
generated to create a "perfect junction" at the 5* and V end of the HCV coding region. 
At the 5' end of pd.ANS3NS5, there were 6 extra bases between the yeast 
ADH2/GAPDH promoter and the ATG of the polypeptide. At the 3' end, there were 52 

1 5 bases of untranslated sequence between the stop codon of the polypeptide and the <x- 
factor terminator in the yeast expression vector. pd.ANS3NS5.PJ was created by 
digesting pd.ANS3NS5 #17 with Seal and SphI to obtain 4963bp Scal-SphI fragment. 
pd.NS5b3011 was digested with SphI and Sail to obtain a 321bp Sphl-Sall fragment 
which gave the "perfect junction" at the 3' end of the polypeptide. The Scal-SphI and 

20 Sphl-Sall fragments were ligated into pSP72 Hindlll-Sall subcloning vector with 
synthetic oligos from HindIII-ScaI(HS) for the "perfect junction" at the 5' end. 

The region of synthetic sequence in pSP72 Hindlll-Sall clone# 6 was verified. 
pSP72 Hindlll-Sall clone#6 was digested with Hindlll arid Blnl or with Blnl and Sail 
to obtain 2441bp Hindffl-Blnl and 2895bp Blnl-Sall fragments, respectively. The 

25 BamHI-Hindlll ADH2/GAPDH promoter fragment was ligated to HindlH-Blnl and 
Blnl-Sall fragments into pBS24.1 BamHI-Sall yeast expression vector. 

nd.ANS3NS5.PJ.corel21RT a nd od.A NS3NS5.PJ.corel73RT were generated 
and encode HCV core aa 1-121 at the C-terminus of the ANS3NS5 polypeptide 
(designated pd.ANS3NS5.PJ.corel21RT, SEQ ID NO:12) and core aa 1-173 at the C- 

30 terminus of the ANS3NS5 polypeptide (designated pd.ANS3NS5.PJ.corel73RT, SEQ 
ID NO: 14). The core sequence had aa 9 mutated from Lys to Arg and aa 1 1 mutated 
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from Asn to Thr, designated as core 121RT or 173RT. 

pd.ANS3NS5.PJ.corel21RT and nd.ANS3NS5.PJ.corel73RT : To generate 
pd.ANS3NS5.PJxorel21RT (Figure 17, SEQ ID NO:12) and 
pd.ANS3NS5.PJ.core!73RT (Figure 18, SEQ ID NO: 14). As shown in Figure 16, a 
5 Notl-Sal HCVcorel21RT and HCVcorel73RT were amplified by PCR, from an E. coli 
expression plasmid, pSODCF2.HCVcorel91RT #2. Either the core 121RT Not-Sall 
PCR product or the core 173RT Not-Sall PCR product were ligated into a pT7Blue2 
Pstl-Sall subcloning vector with synthetic oligos (PN) from PstI to Notl. After 
sequence confirmation, pT7Blue2corel21RT clone#9 and pT7Blue2corel73RT 

10 clone#l 1 was digested with PstI and Sail to obtain 403bp and 559bp Pstl-Sall 
fragments, respectively, for further cloning. 

A 121bp Notl-PstI fragment from pSP72 Hindlll-Sall clone #6 was isolated as 
described above during the cloning of pd.ANS3NS5.PJ. Notl-PstI and Pstl-Sall 
fragments were assembled into a vector made by digesting pd.NS3NS5.PJ clone#5 

1 5 (described above) with Notl and Sail. 

ANS3NS5 and Core 140 and Core 150 : An HCV core epitope was found which 
elicits CTLs in baboons (HCV core aa 121-135). Since pd.ANS3NS5.PJ.corel21RT 
ends right before this potentially important epitope and was expressed better than the 
longer pd.ANS3NS5.PJxorel73RT construct (Example 2), two intermediate constructs 

20 were made which include this epitope, possibly giving intermediate expression levels. 
The two new constructs fused HCV core aa 1-140 or HCV core aal-150 to the C 
terminus of ANS3NS5.PJ. 

pd.ANS3NS5.PJ.corel40RT (Figure 21. SEP ID NO:16^ and 
pd.ANS3NS5.PJ.corel50RT (Figure 22, SEQ ID NO: 18): As shown in Figure 20, a 

25 Pstl-Sall HCVcorel40RT and a PstI-SalIHCVcorel50RT fragment were amplified by 
PCR from pd.ANS3NS5.PJ.corel73RT clone #16. Ligate either HCV core Pstl-Sall 
PCR products into pT7Blue2 Pstl-Sall subcloning vector. After sequence 
confirmation, pT7Blue2corel40RT clone#22 and pT7Blue2corel50RT clone#26 were 
digested with Pstl-Sall to obtain 460bp and 490bp Pstl-Sall fragments, respectively, for 

30 further cloning. 
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A 121bp Notl-PstI fragment was isolated from pSP72 Hindlll-Sall clone #6 (as 
described above during the cloning of pd.ANS3NS5.PJ. Notl-PstI and Pstl-Sall 
fragments were assembled into a vector made by digesting pd.ANS3NS5.PJ clone#5 
(described above) with NotI and Sail. 

5 

Example 2: Protein Expression 

Various of the constructs described herein, encoding HCV-1 ANS3 to NS5 
antigen (aa 1242-301 1), were expressed in yeast S. cerevisiae strain AD3 was 
transformed with pd.ANS3NS5 and checked for expression. A stained protein band at 

1 0 the expected molecular weight of 1 94 kD was not observed (Figure 1 2). Strain AD3 
was also transformed with pd.ANS3NS5.PJ clone #5 and checked for expression. A 
protein band of the expected molecular weight of 194kD was detected (Figure 15). 

Strain AD3 was transformed with pd.ANS3NS5.PJ.corel21RT clone #6 and 
pd.ANS3NS5.PJ.corel73RT clone#15 and checked for expression. Protein bands of the 

1 5 expected molecular weight of 206kD and 2 1 OkD, respectively, were observed. 

Expression levels of the pd.ANS3NS5.PJ.corel73RT construct were much less than 
that of the pd.ANS3NS5,PJ.corel21RT construct. (See Figurel9). Thus, there is a 
correlation of protein expression levels and the length of HCV core. 

Strain AD3 were transformed with pd.ANS3NS5.PJ.corel40RT clone# 29 and 

20 pd.ANS3NS5.PJ.corel50RT clone#35 and checked for expression. Bands of the 

expected molecular weights of 208kD and 209kD were seen by stain at levels close to 
those of pd.ANS3NS5corel73RT (Figure 23). 

Example 3: Eliciting Immune Responses 

25 A. Immunization 

To evaluate the immunogenicity of the mutant NS polypeptides, studies using 
guinea pigs, rabbits, mice, rhesus macaques and/or baboons are performed. The studies 
are structured as follows: DNA immunization alone (single or multiple); DNA 
immunization followed by protein immunization (boost); DNA immunization followed 

30 by protein immunization; immunization by PLG particles. Immunization is 
intramuscular or mucosally. 
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B. Humoral Immune Response 

The humoral immune response is checked in serum specimens from immunized 
animals with anti-NS antibody ELISAs (enzyme-linked immunosorbent assays) at 
5 various times post-immunization. Briefly, serum from immunized animals is screened 
for antibodies directed against the NS or mutant NS proteins. Wells of ELISA 
microtiter plates are coated overnight with the selected HCV protein and washed four 
times; subsequently, blocking is done with PBS-0.2% Tween (Sigma). After removal 
of the blocking solution, diluted mouse serum is added. Sera are tested at various 

10 dilutions. Microtiter plates are washed and incubated with a secondary, peroxidase- 

coupled anti-mouse IgG antibody (Pierce, Rockford, EL). ELISA plates are washed and 
3, 3', 5, 5-tetramethyl benzidine (TMB; Pierce) is added per well. The optical density 
of each well is measured. Titers are typically reported as the reciprocal of the dilution 
of serum that gave a half-maximum optical density (O.D.). Similarly, generation of 

15 neutralization of binding (NOB) antibodies can be measured by methods known in the 
art. 

C. Cellular Immune Response 

The frequency of specific cytotoxic T-lymphocytes (CTL) is evaluated by a 
20 standard chromium release assay of peptide pulsed Balb/c mouse CD4 cells. Briefly, 
spleen cells (Effector cells, E) are obtained from the B ALB/c mice immunized, 
cultured, restimulated, and assayed for CTL activity against HCV peptide-pulsed target 
cells. Cytotoxic activity is measured in a standard 51 Cr release assay. 

n 

25 Example 4: Immunization with PLG-delivered DNA. 

The polylactide-co-glycolide (PLG) polymers are obtained from Boehringer 
Ingelheim, U.S.A. The PLG polymer is RG505, which has a copolymer ratio of 50/50 
and a molecular weight of 65 kDa (manufacturers data). Cationic microparticles with 
adsorbed DNA are prepared using a modified solvent evaporation process, essentially 
30 as described in Singh et al., Proc. Natl Acad, Set USA (2000) 97:81 1-816. Briefly, the 
microparticles are prepared by emulsifying a 5% w/v polymer solution in methylene 
chloride with PBS at high speed using an IKA homogenizer. The primary emulsion is 
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then added to distilled water containing cetyl trimethyl ammonium bromide (CTAB) 
(0.5% w/v). This results in the formation of a w/o/w emulsion which was stirred at 
room temperature, allowing the methylene chloride to evaporate. The resulting 
microparticles are washed in distilled water by centrifugation and freeze dried. 
Following preparation, washing and collection, DNA is adsorbed onto the 
microparticles by incubating cationic microparticles in a solution of DNA. The 
microparticles are then separated by centrifugation, the pellet washed with TE buffer 
and the microparticles are freeze dried, resuspended and administered to animals. 
Antibody titers are measured by ELIS A assays. 



-58- 



WO 01/38360 PCTYUS00/32326 
What is claimed is: 

1 . An isolated mutant non-structural ("NS") HCV polypeptide comprising 
a polypeptide having a mutation in the catalytic domain of NS3, wherein said mutation 

5 functionally disrupts the catalytic domain. 

2. The polypeptide of claim 1, wherein the mutation comprises a deletion. 

3. The polypeptide of claim 1 , wherein the mutation comprises a 
10 substitution. 

4. The polypeptide of any of claims 1-3, wherein said NS polypeptide 
comprises NS3, NS4 and NS5. 

15 5 . The polypeptide of any of claims 1 -3 , wherein said NS polypeptide 

consists of NS3, NS4 and NS5. 

6. The polypeptide of any of claims 1-3, wherein said NS polypeptide 
consists of NS3 and NS5. 

20 

7. The polypeptide of claim 6, wherein NS5 consists of NS5a. 

8. The polypeptide of claim 6, wherein NS5 consists of NS5b. 

25 9. The polypeptide of any of claims 1-3, wherein said NS polypeptide 

consists of NS3 and NS4. 

10. The polypeptide of claim 9, wherein NS4 consists of NS4a. 

30 11. The polypeptide of claim 9, wherein NS4 consists of NS4b. 
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12. The polypeptide of claim 4, farther comprising a second viral 
polypeptide that is not NS3, NS4, or NS5 of HCV. 

13. The polypeptide of claim 12, wherein the second viral polypeptide 
5 comprises an HCV Core polypeptide ("C"), or fragment thereof 

14. The polypeptide of claim 13, wherein the C polypeptide is truncated. 

15. The polypeptide of claim 14, wherein the truncation is at amino acid 
10 121. 

16. The polypeptide of claim 12, wherein the polypeptide further comprises 
an HCV envelope protein ("E"). 

15 17. The polypeptide of claim 1 6, wherein the E is E 1 . 

18. The polypeptide of claim 16, wherein the E is E2. 

19. A composition comprising 

20 (a) the polypeptide of any one of claims 1-18; and 

(b) a pharmaceutical^ acceptable excipient. 

20. An isolated and purified polynucleotide which encodes the mutant HCV 
polypeptide according to any one of claims 1-18. 

25 

21. A composition comprising 

(a) the isolated purified polynucleotide of claim 20; and 

(b) a pharmaceutical^ acceptable excipient. 

30 22. The composition of claim 21 , wherein the polynucleotide is DNA. 
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23. The composition of claim 21, wherein the polynucleotide is in a 
plasmid. 

24. An expression vector comprising the polynucleotide of claim 20. 

5 

25. An expression vector comprising the polynucleotide of SEQ ID NO:8. 

26. A host cell comprising the polynucleotide of claim 20. 
10 27. The host cell of claim 26, wherein the cell is a yeast cell. 

28. The host cell of claim 26, wherein the cell is a mammalian cell. 

29. The host cell of claim 26, wherein the cell is an insect cell. 

15 

30. The host cell of claim 26, wherein the cell is a plant cell. 

3 1 . The host cell of claim 26, wherein the polynucleotide comprises the 
sequence of SEQ ID NO:8. 

20 

32. The polypeptide of claim 1, wherein the polypeptide further comprises 
SEQIDNO:9. 

33. A method of preparing a mutant NS HCV polypeptide, wherein the 
25 method comprises the steps of: 

a. transforming a host cell with an expression vector according to 
claim 24, under conditions wherein the polypeptide is expressed; 
and 



30 



b. isolating the polypeptide. 
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34. The method of claim 33, wherein the host cell is a yeast cell. 

35. The method of claim 33, wherein the host cell is a mammalian cell. 
5 36. The method of claim 33, wherein the host cell is an insect cell. 

37. The method of claim 33, wherein the host cell is a plant cell. 

38. An antibody that specifically binds to a polypeptide of any of claims 1- 

10 18. 

39. The antibody of claim 38, wherein the antibody is a monoclonal 
antibody. 

1 5 40. The antibody of claim 38, wherein the antibody is a purified polyclonal 

antibody. 

41 . A method of eliciting an immune response in a subject, comprising the 
step of administering to the subject a polypeptide of any of claims 1-18. 



20 



42. A method of eliciting an immune response in a subject, comprising the 
step of administering to the subject a polynucleotide of claim 20. 
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FIGURE 1 
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FIGURE 2 
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1 


TCGCGCGTTT CGGTGATGAC GGTGAAAACC TCTGACACAT GCAGCTCCCG GAGACGGTCA CAGCTTGTCT GTAAGCGGAT 
AGCGCGCAAA GCCACTACTG CCACTTTTGG AGACTGTGTA CGTCGAGGGC CTCTGCCAGT GTCGAACAGA CATTCGCCTA 


81 


GCCGGGAGCA GACAAGCCCG TCAGGGCGCG TCAGCGGGTG TTGGCGGGTG TCGGGGCTGG CTTAACTATG CGGCATCAGA 
CGGCCCTCGT CTGTTCGGGC AGTCCCGCGC AGTCGCCCAC AACCGCCCAC AGCCCCGACC GAATTGATAC GCCGTAGTCT 


161 


StuI 

GCAGATTGTA CTGAGAGTGC ACCATATGAA GCTTTTTGCA AAAGCCTAGG CCTCCAAAAA AGCCTCCTCA CTACTTCTGG 
CGTCTAACAT GACTCTCACG TGGTATACTT CGAAAAACGT TTTCGGATCC GGAGGTTTTT TCGGAGGAGT GATGAAGACC 


241 


AATAGCTCAG AGGCCGAGGC GGCCTCGGCC TCTGCATAAA TAAAAAAAAT TAGTCAGCCA TGGGGCGGAG AATGGGCGGA 
TTATCGAGTC TCCGGCTCCG CCGGAGCCGG AGACGTATTT ATTTTTTTTA ATCAGTCGGT ACCCCGCCTC TTACCCGCCT 


321 


ACTGGGCGGG GAGGGAATTA TTGGCTATTG GCCATTGCAT ACGTTGTATC TATATCATAA TATGTACATT TATATTGGCT 
TGACCCGCCC CTCCCTTAAT AACCGATAAC CGGTAACGTA TGCAACATAG ATATAGTATT ATACATGTAA ATATAACCGA 


401 


CATGTCCAAT ATGACCGCCA TGTTGACATT GATTATTGAC TAGTTATTAA TAGTAATCAA TTACGGGGTC ATTAGTTCAT 
GTACAGGTTA TACTGGCGGT ACAACTGTAA CTAATAACTG ATCAATAATT AT CAT TAG TT AATGCCCCAG TAATCAAGTA 


481 


AGCCCATATA TGGAGTTCCG CGTTACATAA CTTACGGTAA ATGGCCCGCC TGGCTGACCG CCCAACGACC CCCGCCCATT 
TCGGGTATAT ACCTCAAGGC GCAATGTATT GAATGCCATT TACCGGGCGG ACCGACTGGC GGGTTGCTGG GGGCGGGTAA 


561 


GACGTCAATA ATGACGTATG TTCCCATAGT AACGCCAATA GGGACTTTCC ATTGACGTCA ATGGGTGGAG TATTTACGGT 
CTGCAGTTAT TACTGCATAC AAGGGTATCA TTGCGGTTAT CCCTGAAAGG TAACTGCAGT TACCCACCTC ATAAATGCCA 


641 


AAACTGCCCA CTTGGCAGTA CATCAAGTGT ATCATATGCC AAGTCCGCCC CCTATTGACG TCAATGACGG TAAATGGCCC 
TTTGACGGGT GAACCGTCAT GTAGTTCACA TAGTATACGG TTCAGGCGGG GG AT AACTGC AGTTACTGCC ATTTACCGGG 


721 


GCCTGGCATT ATGCCCAGTA CATGACCTTA CGGGACTTTC CTACTTGGCA GTACATCTAC GTATTAGTCA TCGCTATTAC 
CGGACCGTAA TACGGGTCAT GTACTGGAAT GCCCTGAAAG GATGAACCGT CATGTAGATG CATAATCAGT AGCGATAATG 


801 


CATGGTGATG CGGTTTTGGC AGTACACCAA TGGGCGTGGA TAGCGGTTTG ACTCACGGGG ATTTCCAAGT CTCCACCCCA 
GTACCACTAC GCCAAAACCG TCATGTGGTT ACCCGCACCT ATCGCCAAAC TGAGTGCCCC TAAAGGTTCA GAGGTGoGo i 


881 


TTGACGTCAA TGGGAGTTTG TTTTGGCACC AAAATCAACG GGACTTTCCA AAATGTCGTA ATAACCCCGC CCCGTTGACG 
AACTGCAGTT ACCCTCAAAC AAAACCGTGG TTTTAGTTGC CCTGAAAGGT TTTACAGCAT TATTGGGGCG GGGCAACiG^ 


961 


CAAATGGGCG GTAGGCGTGT ACGGTGGGAG GTCTATATAA GCAGAGCTCG TTTAGTGAAC CGTCAGATCG CCTGGAGACG 
GTTTACCCGC CATCCGCACA TGCCACCCTC C AG AT AT ATT CGTCTCGAGC AAATCACTTG GCAGTCTAGC GGACClCTGv- 


1041 


CCATCCACGC TGTTTTGACC TCCATAGAAG ACACCGGGAC CGATCCAGCC TCCGCGGCCG GGAACGGTGC ATTGGAACGC 
rrrhrrrrrr araAAArTGrt AGGTATCTTC TGTGGCCCTG GCTAGGTCGG AGGCGCCGGC CCTTGCCACG TAACCTTGCG 


1121 


GGATTCCCCG TGCCAAGAGT GACGTAAGTA CCGCCTATAG ACTCTATAGG CACACCCCTT TGGCTCTTAT GCATGCTATA 
CCTAAGGGGC ACGGTTCTCA CTGCATTCAT GGCGGATATC TGAGATATCC GTGTGGGGAA ACCGAGAATA CGTACGATAT 


1201 


CTGTTTTTGG CTTGGGGCCT ATACACCCCC GCTCCTTATG CTATAGGTGA TGGTATAGCT TAGCCTATAG GTGTGGGTTA 
GACAAAAACC GAACCCCGGA TATGTGGGGG CGAGGAATAC GATATCCACT ACCATATCGA ATCGGATATC CACACCCAAT 


1281 


TTGACCATTA TTGACCACTC CCCTATTGGT GACGATACTT TCCATTACTA ATCCATAACA TGGCTCTTTG CCACAACTAT 
AACTGGTAAT AACTGGTGAG GGGATAACCA CTGCTATGAA AGGTAATGAT TAGGTATTGT ACCGAGAAAC GGTGTTGATA 


1361 


CTCTATTGGC TATATGCCAA TACTCTGTCC TTCAGAGACT GACACGGACT CTGTATTTTT ACAGGATGGG GTCCATTTAT 
GAGATAACCG ATATACGGTT ATGAGACAGG AAGTCTCTGA CTGTGCCTGA GACATAAAAA TGTCCTACCC CAGGTAAATA 
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1441 


T ATT TAG AAA TTCACATATA CAACAACGCC GTCCCCCGTG CCCGCAGTTT TTATTAAACA TAGCGTGGGA 
ATAAATGTTT AAGTGTATAT GTTGTTGCGG CAGGGGGCAC GGGCGTCAAA AATAATTTGT ATCGCACCCT 


TCTCCGACAT 
AGAGGCTGTA 


1521 


CTCGGGTACG TGTTCCGGAC ATGGGCTCTT CTCCGGTAGC GGCGGAGCTT 
GAGCCCATGC ACAAGGCCTG TACCCGAGAA GAGGCCATCG CCGCCTCGAA 


CCACATCCGA GCCCTGGTCC 
GGTGTAGGCT CGGGACCAGG 


CATCCGTCCA 
GTAGGCAGGT 


1601 


GCGGCTCATG GTCGCTCGGC AGCTCCTTGC TCCTAACAGT GGAGGCCAGA 
CGCCGAGTAC CAGCGAGCCG TCGAGGAACG AGGATTGTCA CCTCCGGTCT 


CTTAGGCACA GCACAATGCC 
GAATCCGTGT CGTGTTACGG 


CACCACCACC 


1681 


AGTGTGCCGC ACAAGGCCGT GGCGGTAGGG TATGTGTCTG AAAATGAGCT 
TCACACGGCG TGTTCCGGCA CCGCCATCCC ATACACAGAC TTTTACTCGA 


CGGAGATTGG GCTCGCACCT 
GCCTCTAACC CGAGCGTGGA 


GGACGCAGAT 
CCTGCGTCTA 


1761 


GGAAGACTTA AGGCAGCGGC AGAAGAAGAT GCAGCCAGCT GAGTTGTTGT 
CCTTCTGAAT TCCGTCGCCG TCTTCTTCTA CGTCCGTCGA CTCAACAACA 


ATTCTGATAA GAGTCAGAGG 
TAAGACTATT CTCAGTCTCC 


TAACTCCCGT 
ATTGAGGGCA 


1841 


TGCGGTGCTG TTAACGGTGG AGGGCAGTGT AGTCTGAGCA GTACTCGTTG 
ACGCCACGAC AATTGCCACC TCCCGTCACA TCAGACTCGT CATGAGCAAC 


CTGCCGCGCG CGCCACCAGA 
GACGGCGCGC GCGGTGGTCT 


CATAATAGCT 
GTATTATCGA 


+2 




ECORI 


M A A 


1921 


GACAGACTAA CAGACTGTTC CTTTCCATGG GTCTTTTCTG CAGTCACCGT 
CTGTCTGATT GTCTGACAAG GAAAGGTACC CAGAAAAGAC GTCAGTGGCA 


CGTCGACCTA AGAATTCACC 
GCAGCTGGAT TCTTAAGTGG 


ATGGCTGCAT 
TACCGACGTA 


+2 
2001 


YAAQ G Y K VLVL HPS V A A TLGF GAY 
ATGCAGCTCA GGGCTATAAG GTGCTAGTAC TCAACCCCTC TGTTGCTGCA ACACTGGGCT TTGGTGCTTA 
TACGTCGAGT CCCGATATTC CACGATCATG AGTTGGGGAG ACAACGACGT TGTGACCCGA AACCACGAAT 


M S K 
CATGTCCAAG 
GTACAGGTTC 


+2 
2081 


AHGI DPN I R T G V R T ITT 
GCTCATGGGA TCGATCCTAA CATCAGGACC GGGGTGAGAA CAATTACCAC 
CGAGTACCCT AGCTAGGATT GTAGTCCTGG CCCCACTCTT GTTAATGGTG 


GSP ITYS T Y G 
TGGCAGCCCC ATCACGTACT CCACCTACGG 
ACCGTCGGGG TAGTGCATGA GGTGGATGCC 


+ 2 
2161 


KFL ADGG CSG GAY Dili CDE CHS 
CAAGTTCCTT GCCGACGGCG GGTGCTCGGG GGGCGCTTAT GACATAATAA TTTGTGACGA GTGCCACTCC 
GTTCAAGGAA CGGCTGCCGC CCACGAGCCC CCCGCGAATA CTGTATTATT AAACACTGCT CACGGTGAGG 


T D A 
ACGGATGCCA 
TGCCTACGGT 


+ 2 
2241 


T S I L GIG TVLD Q A E TAG ARLV V L A 
CATCCATCTT GGGCATTGGC ACTGTCCTTG ACCAAGCAGA GACTGCGGGG GCGAGACTGG TTGTGCTCGC 
GTAGGTAGAA CCCGTAACCG TGACAGGAAC TGGTTCGTCT CTGACGCCCC CGCTCTGACC AACACGAGCG 


TAT 
CACCGCCACC 
GTGGCGGTGG 


*2 
2321 


P P G S V T V PHP NICE V A L STT GEIP F Y G 
CCTCCGGGCT CCGTCACTGT GCCCCATCCC AACATCGAGG AGGTTGCTCT GTCCACCACC GGAGAGATCC CTTTTTACGG 
GGAGGCCCGA GGCAGTGACA CGGGGTAGGG TTGTAGCTCC TCCAACGAGA CAGGTGGTGG CCTCTCTAGG GAAAAATGCC 


+ 2 
2401 


K A I P L E V IKG GRH L I F C HSK KKC 
CAAGGCTATC CCCCTCGAAG TAATGAAGGG GGGGAGACAT CTCATCTTCT GTCATTCAAA GAAGAAGTGC 
GTTCCGATAG GGGGAGCTTC ATTAGTTCCC CCCCTCTGTA GAGTAGAAGA CAGTAAGTTT CTTGTTCACG 


DEL 
GACGAACTCG 
CTGCTTGAGC 


+2 
2481 


A A K L VAL GINA V A Y YRG L D V S VIP 
CCGCAAAGCT GGTCGCATTG GGCATCAATG CCGTGCCCTA CTACCGCGGT CTTGACGTGT CCGTCATCCC 
GGCGTTTCGA CCAGCGTAAC CCGTAGTTAC GGCACCGGAT GATGGCGCCA GAACTGCACA GGCAGTAGGG 


T S G 
GACCAGCGGC 
CTGGTCGCCG 


*2 
2561 


DVVV VAT DA L MTGY TGD F D S VIDC NTC 
GATGTTGTCG TCGTGGCAAC CGATGCCCTC ATGACCGGCT ATACCGGCGA CTTCGACTCG GTGATAGACT GCAATACGTG 
CTACAACAGC AGCACCGTTG GCTACGGGAG TACTGGCCGA TATGGCCGCT GAAGCTGAGC CACTATCTGA CGTTATGCAC 
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+2 V TQ TVDF SLD PTF TIET ITL P Q D A V S 
2641 . TGTCACCCAG ACAGTCGATT TCAGCCTTGA CCCTACCTTC ACCATTGAGA CAATCACGCT CCCCCAAGAT GCTGTCTCCC 
ACAGTGGGTC TGTCAGCTAA AGTCGGAACT GGGATGGAAG TGGTAACTCT GTTAGTGCGA GGGGGTTCTA CGACAGAGGG 

+ 2 R T Q R • R G R TGRG KPG I Y R F V A P GER PSG 
2721 GCACTCAACG TCGGGGCAGG ACTGGCAGGG GGAAGCCAGG CAT CT AC AG A TTTGTGGCAC CGGGGGAGCG CCCCTCCGGC 
CGTGAGTTGC AGCCCCGTCC TGACCGTCCC CCTTCGGTCC GTAGATGTCT AAACACCGTG GCCCCCTCGC GGGGAGGCCG 



f2MF0S SVL CEC YDAG CAW Y E L TPAE T T V 
2801 ATGTTCGACT CGTCCGTCCT CTGTGAGTGC TATGACGCAG GCTGTGCTTG GTATGAGCTC ACGCCCGCCG AGACTACAGT 
TACAAGCTGA GCAGGCAGGA GACACTCACG ATACTGCGTC CGACACGAAC CATACTCGAG TGCGGGCGGC TCTGATGTCA 



*2 RLR A Y M N TPG LPV CQ 



W E G 



F T 
StuI 



2881 TAGGCTACGA GCGTACATGA ACACCCCGGG GCTTCCCGTG 
ATCCGATGCT CGCATGTACT TGTGGGGCCC CGAAGGGCAC 



TGCCAGGACC ATCTTGAATT TTGGGAGGGC GTCTTTACAG 
ACGGTCCTGG TAGAACTTAA AACCCTCCCG CAGAAATGTC 



+ 2GLTH IDA HFLS QTK 
StuI 

2961 GCCTCACTCA TATAGATGCC CACTTTCTAT CCCAGACAAA 

CGGAGTGAGT ATATCTACGG GTGAAAGATA GGGTCTGTTT 



QSG EMLP YLV A Y Q 



GCAGAGTGGG GAGAACCTTC CTTACCTGGT AGCGTACCAA 
CGTCTCACCC CTCTTGGAAG GAATGGACCA TCGCATGGTT 



+2ATVC A R A QAP P P S ' 
3041 GCCACCGTGT GCGCTAGGGC TCAAGCCCCT CCCCCATCGT 
CGGTGGCACA CGCGATCCCG AGTTCGGGGA GGGGGTAGCA 



1 DQM H KC LIRL KPT 
GGGACCAGAT GTGGAAGTGT TTGATTCGCC TCAAGCCCAC 
CCCTGGTCTA CACCTTCACA AACTAAGCGG AGTTCGGGTG 



+2 LHG P T P L L Y R LGA VQNE ITL THP VTK 
3121 CCTCCATGGG CCAACACCCC TGCTATACAG ACTGGGCGCT GTTCAGAATG AAATCACCCT GACGCACCCA GTCACCAAAT 
GGAGGTACCC GGTTGTGGGG ACGATATGTC TGACCCGCGA CAAGTCTTAC TTTAGTGGGA CTGCGTGGGT CAGTGGTTTA 

+ 2YIMT CMS ADLE VVT STW VLVG GVL A A L 
3201 ACATCATGAC ATGCATGTCG GCCGACCTGG AGGTCGTCAC GAGCACCTGG GTGCTCGTTG GCGGCGTCCT GGCTGCTTTG 
TGTAGTACTG TACGTACAGC CGGCTGGACC TCCAGCAGTG CTCGTGGACC CACGAGCAAC CGCCGCAGGA G.GACGAAAC 



+ 2 A A Y C LST GCV VIVG RVV LSG KPAI IPD 
3281 GCCGCGTATT GCCTGTCAAC AGGCTGCGTG GTCATAGTGG GCAGGGTCGT CTTGTCCGGG AAGCCGGCAA TCATACCTGA 
CGGCGCATAA CGGACAGTTG TCCGACGCAC CAGTATCACC CGTC CCAGCA GAACAGGCCC TTCGGCCGTT AGTATGGACT 

+ 2 REV LYRE FDE MEE CSQH LPY IEQ G M M 
3361 CAGGGAAGTC CTCTACCGAG AGTTCGATGA GATGGAAGAG TGCTCTCAGC ACTTACCGTA CATCGAGCAA GGGATGATGC 
GTCCCTTCAG GAGATGGCTC TCAAGCTACT CTACCTTCTC ACGAGAGTCG TGAATGGCAT GTAGCTCGTT CCCTACTACo 



+ 2LAE0 FKQ KAL G LLQ TAS RQAE VIA PAV 
3441 TCGCCGAGCA GTTCAAGCAG AAGGCCCTCG GCCTCCTGCA GACCGCGTCC CGTCAGGCAG AGGTTATCGC CCCTGCTGTC 
AGCGGCTCGT CAAGTTCGTC TTCCGGGAGC CGGAGGACGT CTGGCGCAGG GCAGTCCGTC TCCAATAGCG GGGACGACAG 



+2 0 T N QKL ETF WAKH MWN F I S GIQY L A G 

wi rAr,ArCAACT GGCAAAAACT CGAGACCTTC TGGGCGAAGC ATATGTGGAA CTTCATCAGT GGGATACAAT ACTTGGCGGG 
gtcS ggffSS ggSS SS ACCCGCTTCG TATACACCTT GAAGTAGTCA CCCTATGTIA TGAACCGCCC 

T<?T LP GN P A I ASL MAFT AAV TSP LTT 
3601 CTTGTCAACG CTGCCTGGTA ACCCCGCCAT TGCTTCATTG ATGGCTTTTA CAGCTOCTGT CACCAGCCCA CTAACCACTA 
GAACAGTTGC GACGGACCAT TGGGGCGGTA ACGAAGTAAC TACCGAAAA T GTCGACGACA GTGGTCGGGT GATTGGTGAT 

.neoTf t s* m TLGG W V A AQL A A P G AAT A F V 

3 «t GCC^AACCCT CCTCTTCAAC ATATTGGGGG ggtgcgtggc tgcccagctc gccgcccccg gtgccgctac tgcctttgtg 

CGGTTTGGGA GGAGAAGTTG TATAACCCCC CCACCCACCG ACGGGTCGAG CGGCGGGGGC CACGGCGATG ACGGAAACAC 
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+ 2 G A G L A G A A I G S V G L GKV LID I L A G Y G A 
3761 GGCGCTGGCT TAGCTGGCGC CGCCATCGGC AGTGTTGGAC TGGGGAAGGT CCTCATAGAC ATCCTTGCAG GGTATGGCGC 
CCGCGACCGA ATCGACCGCG GCGGTAGCCG TCACAACCTG ACCCCTTCCA GGAGTATCTG TAGGAACGTC CCATACCGCG 



+ 2 G V A G'ALV A F K IMS GEVP STE D L V NLL 
3841 GGGCGTGGCG GGAGCTCTTG TGGCATTCAA GATCATGAGC GGTGAGGTCC CCTCCACGGA GGACCTGGTC AATCTACTGC 
CCCGCACCGC CCTCGAGAAC ACCGTAAGTT CTAGTACTCG CCACTCCAGG GGAGGTGCCT CCTGGACCAG TTAGATGACG 



t2 P A I L SPG ALVV G V V C A A ILRR HVG PGE 
3921 CCGCCATCCT CTCGCCCGGA GCCCTCGTAG TCGGCGTGGT CTGTGCAGCA ATACTGCGCC GGCACGTTGG CCCGGGCGAG 
GGCGGTAGGA GAGCGGGCCT CGGGAGCATC AGCCGCACCA GACACGTCGT TATGACGCGG CCGTGCAACC GGGCCCGCTC 



+ 2 G A V Q WMN RLI AFAS R G N HVS PTHY VPS 
4001 GGGGCAGTGC AGTGGATGAA CCGGCTGATA GCCTTCGCCT CCCGGGGGAA CCATGTTTCC CCCACGCACT ACGTGCCGGA 
CCCCGTCACG TCACCTACTT GGCCGACTAT CGGAAGCGGA GGGCCCCCTT GGTACAAAGG GGGTGCGTGA TGCACGGCCT 



+ 2 SDA A A R V T A I LSS LTVT QLL RRL HQW 
4081 GAGCGATGCA GCTGCCCGCG TCACTGCCAT ACTCAGCAGC CTCACTGTAA CCCAGCTCCT GAGGCGACTG CACCAGTGGA 
CTCGCTACGT CGACGGGCGC AGTGACGGTA TGAGTCGTCG GAGTGACATT GGGTCGAGGA CTCCGCTGAC GTGGTCACCT 



+ 2ISSE CTT PCSG S W L RDI W D W I CEV LSD 
4161 TAAGCTCGGA GTGTACCACT CCATGCTCCG GTTCCTGGCT AAGGGACATC TGGGACTGGA TATGCGAGGT GTTGAGCGAC 
ATTCGAGCCT CACATGGTGA GGTACGAGGC CAACGACCGA TTCCCTGTAG ACCCTGACCT ATACGCTCCA CAACTCGCTG 

+ 2FKTW LKA KLM PQLP GIP F V S CQRG Y K G 

BamHI 



4241 TTTAAGACCT GGCTAAAAGC TAAGCTCATG CCACAGCTGC CTGGGATCCC CTTTGTGTCC TGCCAGCGCG GGTATAAGGG 
AAAT7CTGGA CCGATTTTCG ATTCGAGTAC GGTGTCGACG GACCCTAGGG GAAACACAGG ACGGTCGCGC CCATA7TCCC 



*2 VWR GDGI MHT RCH CGAE ITG HVK NGT 
4321 GGTCTGGCGA GGGGACGGCA TCATGCACAC TCGCTGCCAC TGTGGAGCTG AGATCACTGG ACATGTCAAA AACGGGACGA 
CCAGACCGCT CCCCTGCCGT AGTACGTGTG AGCGACGGTG ACACCTCGAC TCTAGTGACC TGTACAGTTT TTGCCCTGCT 



+ 2 M RIV GPR TCRN MWS GTF PINA Y T T GPC 
4401 TGAGGATCGT CGGTCCTAGG ACCTGCAGGA ACATGTGGAG TGGGACCTTC CCCATTAATG CCTACACCAC GGGCCCCTGT 
ACTCCTAGCA GCCAGGATCC TGGACGTCCT TGTACACCTC ACCCTGGAAG GGGTAATTAC GGATGTGGTG CCCGGGGACA 



+ 2TPLP APN Y T F A L W R V S A EEY VEIR QVG 
4 481 ACCCCCCTTC CTGCGCCGAA CTACACGTTC GCGCTATGGA GGGTGTCTGC AGAGGAATAC GTGGAGATAA GGCAGGTGGG 
TGGGGGGAAG GACGCGGCTT GATGTGCAAG CGCGATACCT CCCACAGACG TCTCCTTATG CACCTCTATT CCGTCCACC-w 



+ 2 D F H YVTG MTT DNL KCPC QVP SPE FFT 
4561 GGACTTCCAC TACGTGACGG GTATGACTAC TGACAATCTT AAATGCCCGT GCCAGGTCCC ATCGCCCGAA ^TTTTCACAG 
CCTGAAGGTG ATGCACTGCC CATACTGATG ACTGTTAGAA TTTACGGGCA CGGTCCAGGG TAGCGGGCTT AAAAAGTGTC 



+2ELDG VRL HRFA PPC KPL LREE VSF RVG 
4 641 AATTGGACGG GGTGCGCCTA CATAGGTTTG CGCCCCCCTG CAAGCCCTTG CTGCGGGAGG AGCTATCATT CAGAGTAGGA 
TTAACCTGCC CCACGCGGAT GTATCCAAAC GCGGGGGGAC G TTCGGGAAC GACGCCCTCC TCCATAGTAA GTCTCATCCT 

+2LHEY PVG SQL PCEP BPD VAV LTSM LTD 
4721 CTCCACGAAT ACCCGGTAGG GTCGCAATTA CCTTGCGAGC CCGAACCGGA CCTGGCCGTG "GACGTCCA TGCTCACTGA 
GAGGTGCTTA TGGGCCATCC CAGCGTTAAT GGAACGCTCG GGCTTGGCCT GCAC CGGCAC AACTGCAGGT ACGAGTGACT 

+2 PSH ITAE A A G RRL ARGS PPS VAS SSA 
4801 TCCCTCCCAT ATAACAGCAG AGGCGGCCGG GCGAAGGTTG GCGAGGGGAT CACCCCCCTC TGTGGCCAGC ^CCTCGGCTA 
AGGGAGGGTA TATTGTCGTC TCCGCCGGCC CGCCTCCAAC CGCTCCCCTA GTGGGGGGAG ACACCGGTCG AGGAGCCGAT 
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+2 
4881 


SQLS AP S L K A T CTA NHO SPOA ELI E A N 
GCCAGCTATC CGCTCCATCT CTCAAGGCAA CTTGCACCGC TAACCATGAC TCCCCTGATG CTGAGCTCAT AGAGGCCAAC 
CGGTCGATAG GCGAGGTAGA GAGTTCCGTT GAACGTGGCG ATTGGTACTG AGGGGACTAC GACTCGAGTA rcrrrrrrrr 


♦2 
4961 


L L W R ' Q E M GGN ITRV E S E NKV V I L D S F D 
CTCCTATGGA GGCAGGAGAT GGGCGGCAAC ATCACCAGGG TTGAGTCAGA AAACAAAGTG GTGATTCTGG ACTCCTTCGA 
GAGGATACCT CCGTCCTCTA CCCGCCGTTG TAGTGGTCCC AACTCAGTCT TTTGTTTrar rarTaArarr rrorranrj-T 


+2 
5041 


P L V A E E D ERE ISV PAEI LRK SRR FAQ 
TCCGCTTGTG GCGGAGGAGG ACGAGCGGGA GATCTCCGTA CCCGCAGAAA TCCTGCGGAA GTCTCGGAGA TTCGCCCAGG 
AGGCGAACAC CGCCTCCTCC TGCTCGCCCT CTAGAGGCAT GGGCGTCTTT AGGArGfTTT CiGkcrrrrT ZArrrrrTrr 


+2 
5121 


A I P V .W A R P D Y N PPL VET W. KKP DYE P P V 
CCCTGCCCGT TTGGGCGCGG CCGGACTATA ACCCCCCGCT AGTGGAGACG TGGAAAAAGC CCGACTACGA ACCACCTGTG 
GGGACGGGCA AACCCGCGCC GGCCTGATAT TGGGGGGCGA TCACCTCTGC APPTTTTTfV rrrTrzTrrr Tr^-rrarsr 


+ 2 
5201 


VHGC PLP PPK SPPV PPP RKK RTVV LTE 
GTCCATGGCT GCCCGCTTCC ACCTCCAAAG TCCCCTCCTG TGCCTCCGCC TCGGAAGAAG CGGACGGTGG TCCTCACTGA 
CAGGTACCGA CGGGCGAAGG TGGAGGTTTC AGGGGAGGAC ACGGAGGCGG AGCCTTCTTC GCCTGCCACC AGGAGTGACT 


+ 2 
5281 


STL STAL AEL A T R SFGS SST SGI TGD 
ATCAACCCTA TCTACTGCCT TGGCCGAGCT CGCCACCAGA AGCTTTGGCA GCTCCTCAAC TTCCGGCATT ACGGGCGACA 
TAGTTGGGAT AGATGACGGA ACCGGCTCGA GCGGTGGTCT TCGAAACCGT CGAGGAGTTG AAGGCCGTAA TGCCCGCTGT 


♦2 
5361 


NTTT SSE PAPS GCP PDS DAES Y S S MPP 
ATACGACAAC ATCCTCTGAG CCCGCCCCTT CTGGCTGCCC CCCCGACTCC GACGCTGAGT CCTATTCCTC CATGCCCCCC 
TATGCTGTTG TAGGAGACTC GGGCGGGGAA GACCGACGGG GGGGCTGAGG CTGCGACTCA GGATAAGGAG GTACGGGGGG 


+2 


LEGE PGD PDL SDGS WST V S S E A N A EDV 
BamHI 


5441 


CTGGAGGGGG AGCCTGGGGA TCCGGATCTT AGCGACGGGT CATGGTCAAC GGTCAGTAGT GAGGCCAACG CGGAGGATGT 
GACCTCCCCC TCGGACCCCT AGGPCTAGAA TCGCTGCCCA GTAPCAGTTG PPAGTPI1TP2V nrcrn-Trr rrnrrTirtt 


♦ 2 
5521 


VCC S M S Y SWT GAL VTPC AAE EQK LP! 
CGTGTGCTGC TCAATGTCTT ACTCTTGGAC AGGCGCACTC GTCACCCCGT GCGCCGCGGA AGAACAGAAA CTGCCCATCA 
GCACACGACG AGTTACAGAA TGAGAACCTG TCCGCGTGAG CAGTGGGGCA CGCGGCGCCT TPTTGTPTTT G^rr:r:GTar:T 


+ 2 
5601 


NALS NSL LRHH NLV YST TSRS ACQ RQK 
ATGCACTAAG CAACTCGTTG CTACGTCACC ACAATTTGGT GTATTCCACC ACCTCACGCA GTGCTTGCCA AAGGCAGAAG 
TACGTGATTC GTTGAGCAAC GATGCAGTGG TGTTAAACCA CATAAGGTGG TGGAGTCCGT CACGAACCGT TTCrGTPTT- 


+2 
5681 


KVTF DRL QVL OSHY Q 0 V LKE VKAA ASK 
AAAGTCACAT TTGACAGACT GCAAGTTCTG GACAGCCATT ACCAGGACGT ACTCAAGGAG GTTAAAGCAG CGGCGTC AAA 
TTTCAGTGTA AACTGTCTGA CGTTCAAGAC CTGTCGGTAA TGGTCCTGCA TGAGTTCCTC CAATTTCGTC GCCGCAGTTT 


+2 
5761 


VKA N L L S VEE ACS LTPP HSA KSK FGY 
AGTGAAGGCT AACTTGCTAT CCGTAGAGGA AGCTTGCAGC CTGACGCCCC CACACTCAGC CAAATCCAAG TTTGGTTATG 
TCACTTCCGA TTGAACGATA GGCArCTCCT TCGAACGTCG GACTGCGGGG GTGTGAGTCG GTTTAGGTTC AAACCAATAC 


+ 2 
5841 


GAKD VRC HARK A V T KIN SVWK DLL EDN 
GGGCAAAAGA CGTCCGTTGC CATGCCAGAA AGGCCGTAAC CCACATCAAC rCCGTGTGGA AAGACCTTCT GGAAGACAAT 
CCCGTTTTCT GCAGGCAACG GTACGGTCTT TCCGGCATTG GGTGTAGTTG AGGCACACCT TTCTGGAAGA CCTTCTGTTA 


+ 2 
5921 


VTPT OTT r M A K N E V FCV QPE KGGR KPA 
GTAACACCAA TAGACACTAC CATCATGGCT AAGAACGAGG TTTTCTGCGT TCAGCCTGAG AAGGGGGGTC GTAAGCCAGC 
CATTGTGGTT ATCTGTGATG GTAGTACCGA TTCTTGCTCC AAAAGACGCA AGTCGGACTC TTCCCCCCAG CATTCGGTCG 
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+ 2 R L I V F P D LGV R V C E K M A LYO VVT KLP 
6001 TCGTCTCATC GTGTTCCCCG ATCTGGGCGT GCGCGTGTGC GAAAAGATGG CTTTGTACGA CGTGGTTACA AAGCTCCCCT 
AGCAGAGTAG CACAAGGGGC TAGACCCGCA CGCGCACACG CTTTTCTACC GAAACATGCT GCACCAATGT TTCGAGGGGA 



*2 


L A V M 


G S S 


Y G F G 


f Y S P 


G Q R 


V E F I 
ECORI 


V Q A 


W K S 


6081 


TGGCCGTGAT 
ACCGGCACTA 


GGGAAGCTCC 
CCCTTCGAGG 


TACGGATTCC 
ATGCCTAAGG 


AATACTCACC 
TTATGAGTGG 


AGGACAGCGG 
TCCTGTCGCC 


GTTGAATTCC 
CAACTTAAGG 


TCGTGCAAGC 
AGCACGTTCG 


GTGGAAGTCC 
CACCTTCAGG 


♦ 2 
6161 


k k r i 

AAGAAAACCC 
TTCTTTTGGG 


? M G F 
CAATGGGGTT 
GTTACCCCAA 


S Y 0 
CTCGTATGAT 
GAGCATACTA 


T R C F 0 S T 
ACCCGCTGCT TTGACTCCAC 
TGGGCGACGA AACTGAGGTG 


V T E 
AGTCACTGAG 
TCAGTGACTC 


S D I E 
AGCGACATCC 
TCGCTGTAGG 


* TEE 
GTACGGAGGA 
CATGCCTCCT 


+2 
6241 


a r y 

GGCAATCTAC 
CCGTTAGATG 


Q C C 1 
CAATGTTGTG 
GTTACAACAC 


5 LDP 
ACCTCGACCC 
TGGAGCTGGG 


Q A R 
CCAAGCCCGC 
GGTTCGGGCG 


V A I K S L T 
GTGGCCATCA AGTCCCTCAC 
CACCGGTAGT TCAGGGAGTG 


E R L 
CGAGAGGCTT 
GCTCTCCGAA 


Y V G 
TATGTTGGGG 
ATACAACCCC 



+2GPLT NSR GENC G Y R RCR ASGV LIT SCG 
6321 GCCCTCTTAC CAATTCAAGG GGGGAGAACT GCGGCTATCG CACGTCCCGC GCGAGCGGCG TACTGACAAC TAGCTGTGGT 
CGGGAGAATG GTTAAGTTCC CCCCTCTTGA CGCCGATAG C GTCCACGGCG CGCTCGCCGC ATGACTGTTG ATCGACACCA 

♦ 2 N T L T CYI KAR A A C R AAG LQD CTML VCG 
6401 AACACCCTCA CTTGCTACAT CAAGGCCCGG GCAGCCTGTC GAGCCGCAGG GCTCCAGGAC TGCACCATGC TCGTGTGTGG 

t1g?g§gagt gaacgatgta GTTCCGGGCC cgtcggacag ctcggcgtcc cgaggtcctg acgtggtacg agcacacacc 



>2 DDL V V I C ESA G V Q E D A A SLR AFT EAM 
6481 CGACGACTTA GTCGTTATCT GTGAAAGCGC GGGGGTCCAG GAGGACGCGG CGAGCCTGAG AGCCTTCACG GAGGCTATGA 

gct^gIIt cagcaataga cactScIcg cccccagg?c ctcctgcgcc gctcggactc tcggaagtgc ctccgatact 



+ 2 T R Y S APP G D P P Q P E YDL ELIT SCS S N V 
6561 CCAGGTACTC CGCCCCCCCr GGGGACCCCC CACAACCAGA ATACGACTTG GAGCTCATAA CATCATGCTC CTCCAACG7G 
GGTCCATGAG GCGGGGGGGA CCCCTGGGGG GTGTTGGTCT TATGCTGAAC CTCGAGTATT GTAGTACGAG GAGGTTGCAC 



+2 S V A H DGA GKR VYYL TRD PTT P L A R A A M 
6641 TCAGTCGCCC ACGACGGCGC TGGAAAGAGG GTCTACTACC TCACCCGTGA CCCTACAACC CCCCTCGCGA GAGCTGCGi^ 

agtcIgcggg tgctgccg^ acctttctcc cagatgatgg agtgggcact gggatgttgg ggggagcgct ctcgacgcac 



4.9 ETA RHTP VNS W L G NIIM FAP TLW ARK 
* rGAGACAGCA AG AC AC AC TC CAGTCAATTC CTGGCTACGC AACATAATCA TGTTTGCCCC CACACTGTGG GCGAGGATGA 
CC^GTCGT TCTGTGTGAG GTcIg^ AAG GACCGATCCG TTGTATTAGT ACAAACGGGG GTGTGACACC CGCTCCTACT 

- T T M T HTF S V L I A R D QLE Q A L 0 CEI VGA 
680 1 TACTGATGAC CCATTTCTTt AGCGTCCTTA TAGCCAGGGA CCAGCTTGAA CAGGCCCTCG ATTGCGAGAT CTACGGGGCC 

. atcactactg ggtaaagaaa TCGCAGGAAT atcggtccct ggtcgaactt gtccgggagc taacgctcta GATGCCCCGG 



.,■ - v <! T FPL DLP PIIQ RLH GLS AFSL H S * 
fiflfil TrcTACTCCA TAGAACCACT GGATCTACCT CCAATCATTC AAAGACTCCA TGGCCTCAGC GCATTTTCAC TCCACAGTTA 

881 a^atgIg^ ™£ Sa cISSoa g^agtaag tttctgaggt accggagtcg cgtaaaagtg aggtgtcaat 

♦2 SPG B I N R V A A _ C L R « L 0 V ? » L « * « R " * 



6961 



CTCTCCAGGT GAAATCAATA GGGTGGCCGC ATGCCTCAGA AAACTTGGGG TACCGCCCTT GCGAGCTTGG ^ACACCGGG 

gIgIggtcca c£S™ SSSSs TACGGAGTCT TTTGAACCCC atggcgggaa cgctcgaacc tctgtggccc 



cccggaIcct ccgcg^g c^tctggcca'gaggag^ag gg^tgccata tgtg|aagt Y acctcttcaa ctgggcagta 
cggcctcgca ggcgcgatcc gaagaccggt ctcctccgtc ccgacggtat acaccgttca tggagaagtt gacccgtcat 
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♦2RTKL KLT PIA A A G Q LDL SGW F T A G YSG 
7121 AGAACAAAGC TCAAACTCAC TCCAATAGCG GCCGCTGGCC AGCTGGACTT GTCCGGCTGG TTCACGGCTG GCTACAGCGG 
TCTTGTTTCG AGTTTGAGTG AGGTTATCGC CGGCGACCGG TCGACCTGAA CAGGCCGACC AAGTGCCGAC CGATGTCGCC 



+ 2 GDI Y HSV SHA RPR W I W F CL L LLA A G V 
7201 GGGAGACATT TATCACAGCG TGTCTCATGC CCGGCCCCGC TGGATCTGGT TTTGCCTACT CCTGCTTGCT GCAGGGGTAG 
CCCTCTGTAA ATACTGTCGC ACAGAGTACG GGCCGGGGCG ACCTAGACCA AAACGGATGA GGACGAACGA CGTCCCCATC 

+2GIYL LPN R 
7281 GCATCTACCT CCTCCCCAAC CGATGAAGGT TGGGGTAAAC ACTCCGGCCT AAAAAAAAAA AAAAATCTAG AAAGGCGCGC 
CGTAGATGGA GGAGGGGTTG GCTACTTCCA ACCCCATTTG TGAGGCCGGA TTTTTTTTTT TTTTTAGATC TTTCCGCGCG 

BamHI Mlul 



7 361 CAAGATATCA AGGATCCACT ACGCGTTAGA GCTCGCTGAT CAGCCTCGAC TGTGCCTTCT AGTTGCCAGC CATCTGTTGT 
GTTCTATAGT TCCTAGGTGA TGCGCAATCT CGAGCGACTA GTCGGAGCTG ACACGGAAGA TCAACGGTCG GTAGACAACA 

7441 TTGCCCCTCC CCCGTGCCTT CCTTGACCCT GGAAGGTGCC ACTCCCACTG TCCTTTCCTA ATAAAATGAG GAAATTGCAT 
AACGGGGAGG GGGCACGGAA GGAACTGGGA CCTTCCACGG TGAGGGTGAC AGGAAAGGAT TATTTTACTC CTTTAACGTA 



7521 CGCATTGTCT GAGTAGGTGT CATTCTATTC TGGGGGGTGG GGTGGGGCAG GACAGCAAGG GGGAGGATTG GGAAGACAAT 
GCGTAACAGA CTCATCCACA GTAAGATAAG ACCCCGCACC CCACCCCGTC CTGTCGTTCC CCCTCCTAAC CCTTCTGTTA 



7601 AGCAGGCATG CTGGGGAGCT CTTCCGCTTC CTCGCTCACT GACTCGCTGC GCTCGGTCGT TCGGCTGCGG CGAGCGGTAT 
TCGTCCGTAC GACCCCTCGA GAAGGCGAAG GAGCGAGTGA CTGAGCGACG CGAGCCAGCA AGCCGACGCC GCTCGCCATA 

7681 CAGCTCACTC AAAGGCGGTA ATACGGTTAT CCACAGAATC AGGGGATAAC GCAGGAAAGA ACATGTGAGC AAAAGGCCAG 
GTCGAGTGAG TTTCCGCCAT TATGCCAATA GGTGTCTTAG TCCCCTATTG CGTCCTTTCT TGTACACTCG TTTTCCGGTC 

7761 CAAAAGGCCA GGAACCGTAA AAAGGCCGCG TTGCTGGCGT TTTTCCATAG GCTCCGCCCC CCTGACGAGC ATGACAAAAA 
GTTTTCCGGT CCTTGGCATT TTTCCGGCGC AACGACCGCA AAAAGGTATC CGAGGCGGGG GGACTGCTCG TAGTGTTTTT 



7841 TCGACGCTCA AGTCAGAGGT GGCGAAACCC GACAGGACTA TAAAGATACC AGGCGTTTCC CCCTGGAAGC TCCCTCGTGC 
AGCTGCGAGT TCAGTCTCCA CCGCTTTGGG CTGTCCTGAT ATTTCTATGG TCCGCAAAGG GGGACCTTCG AGGGAGCACG 

7921 GCTCTCCTGT TCCGACCCTG CCGCTTACCG GATACCTGTC CGCCTTTCTC CCTTCGGGAA GCGTGGCGCT TTCTCAATGC 
CGAGAGGACA AGGCTGGGAC GGCGAATGGC CTATGGACAG GCGGAAAGAG GGAAGCCCTT CGCACCGCGA AAGAGTTACo 

8001 TCACGCTGTA GGTATCTCAG TTCGGTGTAG GTCGTTCGCT CCAAGCTGGG CTGTGTGCAC GAACCCCCCG TTCAGCCCGA 
AGTGCGACAT CCATAGAGTC AAGCCACATC CAGCAAGCGA GGTTCGACCC GACACACGTG CTTGGGGGGC AAGTCGoGCT 

8081 CCGCTGCGCC TTATCCGGTA ACTATCGTCT TGAGTCCAAC CCGGTAAGAC ACGACTTATC GCCACTGGCA GCAGCCACTG 
GGCGACGCGG AATAGGCCAT TGATAGCAGA ACTCAGGTTG GGCCATTCTG T GCTGAATAG CGGTGACCGT CGTCGGTGAC 

8161 GTAACAGGAT TAGCAGAGCG AGGTATGTAG GCGGTGCTAC AGAGTTCTTG AAGTGGTGGC CTAACTACGG CTACACTAGA 
CATTGTCCTA ATCGTCTCGC TCCATACATC CGCCACGATG 7CTCAAGAAC TTCACCACCG GATTGATGCC GATGTGATCT 

8241 AGGACAGTAT TTGGTATCTG CGCTCTGCTG AAGCCAGTTA CCTTCGGAAA AAGAGTTGGT AGCTCTTGAT CCGGCAAACA 
TCCTGTCATA AACCATAGAC GCGAGACGAC TTCGGTCAAT GGAAGCCTTT TTCTCA ACCA TCGAGAACTA GGCCGTTTGT 

AACCACCGCT GGTAGCGGTG GTTTTTTTGT TTGCAAGCAG CAGATTACGC GCAGAAAAAA AGGATCTCAA GAAGATCCTT 
TTGGTGGCGA CCATCGCCAC CAAAAAAACA AACGTTCGTC GTCTAATGCG CGTCTTTTTT TCCTAGAGTT CTTCTAGGAA 

9401 TGATCTTTTC TACGGGGTCT GACGCTCAGT GGAACGAAAA CTCACGTTAA GGGATTTTGG TCATGAGATT ATC AAAAAGG 

I^aSaaag Itgccccaga ctgcgagtca ccttgcittt gagtgcaatt CCCTAAAACC agtactctaa tagtttttcc 
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8481 ATCTTCACCT AGATCCTTTT AAATTAAAAA TGAAGTTTTA AATCAATCTA AAGTATATAT GAGTAAACTT GGTCTGACAG 
. TAGAAGTGGA TCTAGGAAAA TTTAATTTTT ACTTCAAAAT TTAGTTAGAT TTCATATATA CTCATTTGAA CCAGACTGTC 

8 561 TTACCAATSC TTAATCAGTG AGGCACCTAT CTCAGCGATC TGTCTATTTC GTTCATCCAT AGTTGCCTGA CTCCCCGTCG 
AATGGTTACG AATTAGTCAC TCCGTGGATA GAGTCGCTAG ACAGATAAAG CAAGTAGGTA TCAACGGACT GAGGGGCAGC 



8 641 TGTAGATAAC TACGATACGG GAGGGCTTAC CATCTGGCCC CAGTGCTGCA ATGATACCGC GAGACCCACG CTCACCGGCT 

ACATCTATTG ATGCTATGCC CTCCCGAATG GTAGACCGGG GTCACGACGT TACTATGGCG CTCTGGGTGC GAGTGGCCGA 

8721 CCAGATTTAT CAGCAATAAA CCAGCCAGCC GGAAGGGCCG AGCGCAGAAG TGGTCCTGCA ACTTTATCCG CCTCCATCCA 

GGTCTAAATA GTCGTTATTT GGTCGGTCGG CCTTCCCGGC TCGCGTCTTC ACCAGGACGT TGAAATAGGC GGAGGTAGGT 



8801 GTCTATTAAT TGTTGCCGGG AAGCTAGAGT AAGTAGTTCG CCAGTTAATA GTTTGCGCAA CGTTGTTGCC ATTGCTACAG 
CAGATAATTA ACAACGGCCC TTCGATCTCA TTCATCAAGC GGTCAATTAT CAAACGCGTT GCAACAACGG TAACGATGTC 

8881 GCATCGTGGT GTCACGCTCG TCGTTTGGTA TGGCTTCATT CAGCTCCGGT TCCCAACGAT CAAGGCGAGT TACATGATCC 
CGTAGCACCA CAGTGCGAGC AGCAAACCAT ACCGAAGTAA GTCGAGGCCA AGGGTTGCTA GTTCCGCTCA ATGTACTAGG 



8961 CCCATGTTGT GCAAAAAAGC GGTTAGCTCC TTCGGTCCTC CGATCGTTGT CAGAAGTAAG TTGGCCGCAG TGTTATCACT 
GGGTACAACA CGTTTTTTCG CCAATCGAGG AAGCCAGGAG GCTAGCAACA GTCTTCATTC AACCGGCGTC ACAATAGTGA 



9041 CATGGTTATG GCAGCACTGC ATAATTCTCT TACTGTCATG CCATCCGTAA GATGCTTTTC TGTGACTGGT GAGTACTCAA 
GTACCAATAC CGTCGTGACG TATTAAGAGA ATGACAGTAC GGTAGGCATT CTACGAAAAG ACACTGACCA CTCATGAGTT 

9121 CCAAGTCATT CTGAGAATAG TGTATGCGGC GACCCAGTTG CTCTTGCCCG GCGTCAATAC GGGATAATAC CGCGCCACAT 
GGTTCAGTAA GACTCTTATC ACATACGCCG CTGGCTCAAC GAGAACGGGC CGCAGTTATG CCCTATTATG GCGCGGTGTA 

9201 AGCAGAACTT TAAAAGTGCT CATCATTGGA AAACGTTCTT CGGGGCGAAA ACTCTCAAGG ATCTTACCGC TGTTGAGATC 
TCGTCTTGAA ATTTTCACGA GTAGTAACCT TTTGCAAGAA GCCCCGCTTT TGAGAGTTCC TAGAATGGCG ACAACTCTAG 

9281 CAGTTCGATG TAACCCACTC GTGCACCCAA CTGATCTTCA GCATCTTTTA CTTTCACCAG CGTTTCTGGG TGAGCAAAAA 
GTCAAGCTAC ATTGGGTGAG CACGTGGGTT GACTAGAAGT CGTAGAAAAT GAAAGTGGTC GCAAAGACCC ACTCGTTTTT 



9361 CAGGAAGGCA AAATGCCGCA AAAAAGGGAA TAAGGGCGAC ACGGAAATGT TGAATACTCA TACTCTTCCT TTTTCAATAT 
GTCCTTCCGT TTTACGGCGT TTTTTCCCTT ATTCCCGCTG TGCCTTTACA ACTTATGAGT ATGAGAAGGA AAAAGTTATA 

9441 TATTGAAGCA TTTATCAGGG TTATTGTCTC ATGAGCGGAT AC AT ATT TG A ATGTATTTAG AA^TAAAC AAATAGGGGT 
ATAACTTCGT AAATAGTCCC AATAACAGAG TACTCGCCTA TGTATAAACT TACATAAATC TTTTTATTTG TTTATCCCCA 



9521 TCCGCGCACA TTTCCCCGAA AAGTGCCACC TGACGTCTAA GAAACCATTA TTATCATGAC ATTAACCTAT AAAAATAGGC 
AGGCGCGTGT AAAGGGGCTT TTCACGGTGG ACTGCAGATT CTTTGGTAAT AATAGTACTG TAATTGGATA TTTTTATCCG 



9601 GTATCACGAG GCCCTTTCGT C 
CATAGTGCTC CGGGAAAGCA G 
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l TCGCGCGTTT CGGTGATGAC GGTGAAAACC TCTGACACAT GCAGCTCCCG GAGACGGTCA CAGCTTGTCT GTAAGCGGAT 
AGCGCGCAAA GCCACTACTG CCACTTTTGG AGACTGTGTA CGTCGAGGGC CTCTGCCAGT GTCGAACAGA CATTCGCCTA 



81 GCCGGGAGCA GACAAGCCCG TCAGGGCGCG TCAGCGGGTG TTGGCGGGTG TCGGGGCTGG CTTAACTATG CGGCATCAGA 
CGGCCCTCGT CTGTTCGGGC AGTCCCGCGC AGTCGCCCAC AACCGCCCAC AGCCCCGACC GAATTGATAC GCCGTAGTCT 



StuI 



161 GCAGATTGTA CTGAGAGTGC ACCATATGAA GCTTTTTGCA AAAGCCTAGG CCTCCAAAAA AGCCTCCTCA CTACTTCTGG 
CGTCTAACAT GACTCTCACG TGGTATACTT CGAAAAACGT TTTCGGATCC GGAGGTTTTT TCGGAGGAGT GATGAAGACC 



241 AATAGCTCAG AGGCCGAGGC GGCCTCGGCC TCTGCATAAA TAAAAAAAAT TAGTCAGCCA TGGGGCGGAG AATGGGCGGA 
TTATCGAGTC TCCGGCTCCG CCGGAGCCGG AGACGTATTT ATTTTTTTTA ATCAGTCGGT ACCCCGCCTC TTACCCGCCT 



321 ACTGGGCGGG GAGGGAATTA TTGGCTATTG GCCATTGCAT ACGTTGTATC TATATCATAA TATGTACATT TATATTGGCT 
TGACCCGCCC CTCCCTTAAT AACCGATAAC CGGTAACGTA TGCAACATAG ATATAGTATT ATACATGTAA ATATAACCGA 



4 0L CATGTCCAAT ATGACCGCCA TGTTGACATT GATTATTGAC TAGTTATTAA TAGTAATCAA TTACGGGGTC ATTAGTTCAT 
GTACAGGTTA TACTGGCGGT ACAACTGTAA CTAATAACTG ATCAATAATT ATCATTAGTT AATGCCCCAG TAATCAAGTA 



481 AGCCCATATA TGGAGTTCCG CGTTACATAA CTTACGGTAA ATGGCCCGCC TGGCTGACCG CCCAACGACC CCCGCCCATT 
TCGGGTATAT ACCTCAAGGC GCAATGTATT GAATGCCATT TACCGGGCGG ACCGACTGGC GGGTTGCTGG GGGCGGGTAA 



561 GACGTCAATA ATGACGTATG TTCCCATAGT AACGCCAATA GGGACTTTCC ATTGACGTCA ATGGGTGGAG TATTTACGGT 
CTGCAGTTAT TACTGCATAC AAGGGTATCA TTGCGGTTAT CCCTGAAAGG TAACTGCAGT TACCCACCTC ATAAATGCCA 



641 AAACTGCCCA CTTGGCAGTA CATCAAGTGT ATCATATGCC AAGTCCGCCC CCTATTGACG TCAATGACGG TAAATGGCCC 
TTTGACGGGT GAACCGTCAX GTAGTTCACA TAGTATACGG TTCAGGCGGG GGATAACTGC AGTTACTGCC ATTTACCGGG 



721 GCCTGGCATT ATGCCCAGTA CATGACCTTA CGGGACTTTC CTACTTGGCA GTACATCTAC GTATTAGTCA TCGCTATTAC 
CGGACCGTAA TACGGGTCAT GTACTGGAAT GCCCTGAAAG GATGAACCGT CATGTAGATG CATAATCAGT AGCGATAATG 



801 CATGGTGATG CGGTTTTGGC AGTACACCAA TGGGCGTGGA TAGCGGTTTG ACTCACGGGG ATTTCCAAGT CTCCACCCCA 
GTACCACTAC GCCAAAACCG TCATGTGGTT ACCCGCACCT ATCGCCAAAC TGAGTGCCCC TAAAGGTTCA GAGGTGGGGT 



881 TTGACGTCAA TGGGAGTTTG TTTTGGCACC AAAATCAACG GGACTTTCCA AAATGTCGTA ATAACCCCGC CCCGTTGACG 
AACTGCAGTT ACCCTCAAAC AAAACCGTGG TTTTAGTTGC CCTGAAAGGT TTTACAGCAT TATTGGGGCG GGGCAACTGC 



961 CAAATGGGCG GTAGGCGTGT ACGGTGGGAG GTCTATATAA GCAGAGCTCG TTTAGTGAAC CGTCAGATCG CCTGGAGACG 
GTTTACCCGC CATCCGCACA TGCCACCCTC CAGATATATT CGTCTCGAGC AAATCACTTG GCAGTCTAGC GGACCTCTGC 



1041 CCATCCACGC TGTTTTGACC TCCATAGAAG ACACCGGGAC CGATCCAGCC TCCGCGGCCG GGAACGGTGC ATTGGAACGC 
GGTAGGTGCG ACAAAACTGG AGGTATCTTC TGTGGCCCTG GCTAGGTCGG AGGCGCCGGC CCTTGCCACG TAACCTTGCG 



1121 GGATTCCCCG TGCCAAGAGT GACGTAAGTA CCGCCTATAG ACTCTATAGG CACACCCCTT TGGCTCTTAT GCATGCTATA 
CCTAAGGGGC ACGGTTCTCA CTGCATTCAT GGCGGATATC TGAGATATCC GTGTGGGGAA ACCGAGAATA CGTACGATAT 



1201 CTGTTTTTGG CTTGGGGCCT ATACACCCCC GCTCCTTATG CTATAGGTGA TGGTATAGCT TAGCCTATAG GTGTGGGTTA 
GACAAAAACC GAACCCCGGA TATGTGGGGG CGAGGAATAC GATATCCACT ACCATATCGA ATCGGATATC CACACCCAAT 



1281 TTGACCATTA TTGACCACTC CCCTATTGGT GACGATACTT TCCATTACTA ATCCATAACA TGGCTCTTTG CCACAACTAT 
AACrGGTAAT AACTGGTGAG GGGATAACCA CTGCTArGAA AGGTAATGAT TAGGTATTGT ACCGAGAAAC GGTGTTGATA 



1361 CTCTATTGGC TATATGCCAA TACTCTGTCC TTCAGAGACT GACACGGACT CTGTATTTTT ACAGGATGGG GTCCATTTAT 
GAGATAACCG ATATACGGTT ATGAGACAGG AAGTCTCTGA CTGTGCCTGA GACATAAAAA TGTCCTACCC CAGGTAAATA 
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1441 TATTTACAAA TTCACATATA CAACAACGCC GTCCCCCGTG CCCGCAGTTT TTATTAAACA TAGCGTGGGA TCTCCGACA7 



ATAAATGTTT AAGTGTATAT GTTGTTGCGG CAGGGGGCAC GGGCGTCAAA AATAATTTGT ATCGCACCCT AGAGGCTGTA 


1521 


CTCGGGTACG TGTTCCGGAC ATGGGCTCTT CTCCGGTAGC GGCGGAGCTT CCACATCCGA GCCCTGGTCC CATCCGTCCA 
GAGCCCATGC ACAAGGCCTG TACCCGAGAA GAGGCCATCG CCGCCTCGAA GGTGTAGGCT CGGGACCAGG GTAGGCAGGT 


1601 


GCGGCTCATG GTCGCTCGGC AGCTCCTTGC TCCTAACAGT GGAGGCCAGA CTTAGGCACA GCACAATGCC CACCACCACC 
CGCCGAGTAC CAGCGAGCCG TCGAGGAACG AGGATTGTCA CCTCCGGTCT GAATCCGTGT CGTGTTACGG GTGGTGGTGG 


1681 


AGTGTGCCGC ACAAGGCCGT CGCGGTAGGG TATGTGTCTG AAAATGAGCT CGGAGATTGG GCTCGCACCT GGACGCAGAT 
TCACACGGCG TGTTCCGGCA CCGCCATCCC ATACACAGAC TTTTACTCGA GCCTCTAACC CGAGCGTGGA CCTGCGTCTA 


1761 


GGAAGACTTA AGGCAGCGGC AGAAGAAGAT GCAGGCAGCT GAGTTGTTGT ATTCTGATAA GAGTCAGAGG TAACTCCCGT 
CCTTCTGAAT TCCGTCGCCG TCTTCTTCTA CGTCCGTCGA CTCAACAACA TAAGACTATT CTCAGTCTCC ATTGAGGGCA 


1841 


TGCGGTGCTG TTAACGGTGG AGGGCAGTGT AGTCTGAGCA GTACTCGTTG CTGCCGCGCG CGCCACCAGA CATAATAGCT 
ACGCCACGAC AATTGCCACC TCCCGTCACA TCAGACTCGT CATGAGCAAC GACGGCGCGC GCGGTGGTCT GTATTA7CGA 



+ 2 M A A 

. EcoRI 



1921 GACAGACTAA CAGACTGTTC CTTTCCATGG GTCTTTTCTG CAGTCACCGT CGTCGACCTA AGAATTCACC ATGGCTGCAT 
CTGTCTGATT GTCTGACAAG GAAAGGTACC CAGAAAAGAC GTCAGTGGCA GCAGCTGGAT TCTTAAGTGG TACCGACGTA 



♦ 2 Y A A Q G Y K VLVL NPS V A A T L G F GAY 4SK 
2001 ATGCAGCTCA GGGCTATAAG GTGCTAGTAC TCAACCCCTC TGTTGCTGCA ACACTGGGCT TTGGTGCTTA CATGTCCAAG 
TACGTCGAGT CCCGATATTC CACGATCATG AGTTGGGGAG ACAACGACGT TGTGACCCGA AACCACGAAT GTACAGGTTC 



♦2AHGI DPN IRT GVRT ITT G S P ITYS TYG 
2081 GCTCATGGGA TCGATCCTAA CATCAGGACC GGGGTGAGAA CAATTACCAC TGGCAGCCCC ATCACGTACT CCACCTACGG 
CGAGTACCCT AGCTAGGATT GTAGTCCTGG CCCCACTCTT GTTAATGGTG ACCGTCGGGG TAGTGCATGA GGTGGATGCC 



+ 2 KFL ADGG CSG GAY Dili C D E CHS TDA 
2161 CAAGTTCCTT GCCGACGGCG GGTGCTCGGG GGGCGCTTAT GACATAATAA TTTGTGACGA GTGCCACTCC ACGGATGCCA 
GTTCAAGGAA CGGCTGCCGC CCACGAGCCC CCCGCGAATA CTGTATTATT AAACACTGCT CACGGTGAGG TGCCTACGGT 



+ 2TSIL GIG TVLO QAE TAG ARLV VLA TAT 
2241 CATCCATCTT GGGCATTGGC ACTGTCCTTG ACCAAGCAGA GACTGCGGGG GCGAGACTGG* TTGTGCTCGC CACCGCCACC 
GTAGGTAGAA CCCGTAACCG TGACAGGAAC TGGTTCGTCT CTGACGCCCC CGCTCTGACC AACACGAGCG GTGGCGGTGG 



+ 2PPGS V T V PHP NIEE V A L STT GEIP FYG 
2321 CCTCCGGGCT CCGTCACTGT GCCCCATCCC AACATCGAGG AGGTTGCTCT GTCCACCACC GGAGAGATCC CTTTTTACGG 
GGAGGCCCGA GGCAGTGACA CGGGGTAGGG TTGTAGCTCC TCCAACGAGA CAGGTGGTGG CCTCTCTAGG GAAAAATGCC 



+ 2 KAI PLEV I K G GRH L I F C HSK KKC DEL 
2401 CAAGGCTATC CCCCTCGAAG TAATCAAGGG GGGGAGACAT CTCATCTTCT GTCATTCAAA GAAGAAGTGC GACGAACTC3 
GTTCCGATAG GGGGAGCTTC ATTAGTTCCC CCCCTCTGTA GAGTAGAAGA CAGTAAGTTT CTTCTTCACG CTGCTTGAGC 



+ 2 A A K L VAL GINA VAY YRG L D V S VIP TSG 
2481 CCGCAAAGCT GGTCGCATTG GGCATCAATG CCGTGGCCTA CTACCGCGGT CTTGACGTGT CCGTCATCCC GACCAGCGGC 
GGCGTTTCGA CCAGCGTAAC CCGTAGTTAC GGCACCGGAT GATGGCGCCA GAACTGCACA GGCAGTAGGG CTGGTCGCCG 

*2 D V V V VAT DAL MTGY TGD F D S VIDC NTC 
2561 GATGTTGTCG TCGTGGCAAC CGATGCCCTC ATGACCGGCT ATACCGGCGA CTTCGACTCG GT GAT AG ACT GCAATACGTG 
CTACAACAGC AGCACCGTTG GCTACGGGAG TACTGGCCGA TATGGCCGCT GAAGCTGAGC CACTATCTGA CGTTATGCAC 
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♦2 VTQ TVDF SLD PTF T I E T ITL PQO AVS 
2641 TGTCACCCAG ACAGTCGATT TCAGCCTTGA CCCTACCTTC ACCATTGAGA CAATCACGCT CCCCCAAGAT GCTGTCTCCC 
ACAGTGGGTC TGTCAGCTAA AGTCGGAACT GGGATGGAAG TGGTAACTCT GTTAGTGCGA GGGGGTTCTA CGACAGAGGG 



+2 
2721 


RTQR R G R TGRG KPG I Y R FVAP GER PSG 
GCACTCAACG TCGGGGCAGG ACTGGCAGGG GGAAGCCAGG CATCTACAGA TTTGTGGCAC CGGGGGAGCG CCCCTCCGGC 
CGTGAGTTGC AGCCCCGTCC TGACCGTCCC CCTTCGGTCC GTAGATGTCT AAACACCGTG GCCCCCTCGC GGGGAGGCCG 


+2 
2801 


MFDS SVL CEC Y D A G CAW YEL TPAE T T V 
ATGTTCGACT CGTCCGTCCT CTGTGAGTGC TATGACGCAG GCTGTGCTTG GTATGAGCTC ACGCCCGCCG AGACTACAGT 
TACAAGCTGA GCAGGCAGGA GACACTCACG ATACTGCGTC CGACACGAAC CATACTCGAG TGCGGGCGGC TCTGATGTCA 


+2 


RLR AYMN TPG LPV CQDH.LEF WEG VFT 

StuI 


2881 


TAGGC1ACGA GCGTACATGA ACACCCCGGG GCTTCCCGTG TGCCAGGACC ATCTTGAATT TTGGGAGGGC GTCTTTACAG 
ATCCGATGCT CGCATGTACT TGTGGGGCCC CGAAGGGCAC ACGGTCCTGG TAGAACTTAA AACCCTCCCG CAGAAATGTC 


+2 


GLTH IDA HFLS QTK QSG ENLP YLV AYQ 
StuI 


2961 


GCCTCACTCA TATAGATGCC CACTTTCTAT CCCAGACAAA GCAGAGTGGG GAGAACCTTC CTTACCIGGT AGCGTACCAA 
CGGAGTGAGT ATATCTACGG GTGAAAGATA GGGTCTGTTT CGTCTCACCC CTCTTGGAAG GAATGGACCA TCGCATGGTT 


+2 
3041 


ATVC A R A QAP PPSW OQM W K C LIRL KPT 
GCCACCGTGT GCGCTAGGGC TCAAGCCCCT CCCCCATCGT GGGACCAGAT GTGGAAGTGT TTGATTCGCC TCAAGCCCAC 
CGGTGGCACA CGCGATCCCG AGTTCGGGGA GGGGGTAGCA CCCTGGTCTA CACCTTCACA AACTAAGCGG AGTTCGGGTG 


+ 2 
3121 


LHG PTPL LYR LGA VQNE ITL THP VTK 
CCTCCATGGG CCAACACCCC TGCTATACAG ACTGGGCGCT GTTCAGAATG AAATCACCCT GACGCACCCA GTCACCAAAT 
GGAGGTACCC GGTTGTGGGG ACGATATGTC TGACCCGCGA CAAGTCTTAC TTTAGTGGGA CTGCGTGGGT CAGTGGTT7A 


+ 2 
3201 


YIMT CMS ADLE VVT STW VLVG GVL AAL 
AC AT CAT G AC ATGCATGTCG GCCGACCTGG AGGTCGTCAC GAGCACCTGG GTGCTCGTTG GCGGCGTCCT GGCTGCTTTG 
TGTAGTACTG TACGTACAGC CGGCTGGACC TCCAGCAGTG CTCGTGGACC CACGAGCAAC CGCCGCAGGA CCGACGAAAC 


+ 2 
3281 


A A Y C LST GCV VIVG RVV LSG KPAI IPO 
GCCGCGTATT GCCTGTCAAC AGGCTGCGTG GTCATAGTGG GCAGGGTCGT CTTGTCCGGG AAGCCGGCAA TCATACCTGA 
CGGCGCATAA CGGACAGTTG TCCGACGCAC CAGTATCACC CGTCCCAGCA GAACAGGCCC TTCGGCCGTT AGTATGGACT 


+ 2 
3361 


REV LYRE FDE M E E CSQH LPY IEQ GMM 
CAGGGAAGTC CTCTACCGAG AGTTCGATGA GATGGAAGAG TGCTCTCAGC ACTTACCGTA CATCGAGCAA GGGATGATSC 
GTCCCTTCAG GAGATGGCTC TCAAGCTACT CTACCTTCTC ACGAGAGTCG TGAATGGCAT GTAGCTCGTT CCCTACTACG 


3441 


LAEQ FKQ KALG LLQ TAS RQAE V I A P A V 
TCGCCGAGCA GTTCAAGCAG AAGGCCCTCG GCCTCCTGCA GACCGCGTCC CGTCAGGCAG AGGTTATCGC CCCTGCTGTC 
AGCGGCTCGT CAAGTTCGTC TTCCGGGAGC CGGAGGACGT CTGGCGCAGG GCAGTCCGTC TCCAATAGCG GGGACGACAG 


+2 
3521 


QTNW QKL ETF WAKH MWN F I S GIQY LAG 
CAGACCAACT GGCAAAAACT CGAGACCTTC TGGGCGAAGC ATATGTGGAA CTTCATCAGT GGGATACAAT ACTTGGCGGG 
GTCTGGTTGA CCGTTTTTGA GCTCTGGAAC ACCCGCTTCG TATACACCTT GAAGTAGTCA CCCTATGTTA TGAACCGCCC 


+2 
3601 


LST LPGN PAI ASL MAFT AAV TSP LIT 
CTTGTCAACG CTGCCTGGTA ACCCCGCCAT TGCTTCATTG ATGGCTTTTA CAGCTGCTGT CACCAGOCCA CTAACCACTA 
GAACAGTTGC GACGGACCAT TGGGGCGGTA ACGAAGTAAC TACCGAAAAT GTCGACGACA GTGGTCGGGT .GATTGGTGAT 


♦2 
3681 


SQTL LFN ILGG W V A AQL AAPG AAT AFV 
GCCAAACCCT CCTCTTCAAC ATATTGGGGG GGTGGGTGGC TGCCCAGCTC GCCGCCCCCG GTGCCGCTAC TGCCTTTGTG 
CGGTTTGGGA GGAGAAGTTG TATAACCCCC CCACCCACCG ACGGGTCGAG CGGCGGGGGC CACGGCGATG ACGGAAACAC 
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♦2 G A G L A G A A I G, S V G L GKV LID ILAG 
37 61. GGCGCTGGCT TAGCTGGCGC CGCCATCGGC AGTGTTGGAC TGGGGAAGGr CCTCATAGAC ATCCTTGCAG GGrlrrrrrr 
CCGCGACCGA ATCCACCGCG GCGGTAGCCG TCACAACCTG ACCCCTTCCA GGAGTATCTG TAGGAACGTC CCATACCGCG 



^5 VA G A L V AFK IMS GEVP S T E D L V N L L 
3841 GGGCGTGGCG GGAGCTCTTG TGGCATTCAA GATCATGAGC GGTGAGGTCC CCTCCACGGA GGACCTGGTC AATCTACTcr 
CCCGCACCGC CCTCGAGAAC ACCGTAAGTT CTAGTACTCG CCACTCCAGG GGAGGTGCCT CCTGGACCAG TTAGATGACG 



* 2 p A I L SPG ALVV G V V CAA I L R R H V G P G E 
3921 CCGCCATCCT CTCGCCCGGA GCCCTCGTAG TCGGCGTGGT CTGTGCAGCA ATACTGCGCC GGCACGTTGG CCCGGGCGAG 
GGCGGTAGGA GAGCGGGCCT CGGGAGCATC AGCCGCACCA GACACGTCGT TATGACGCGG CCGTGCAACC GGGCCCGCTC 



+2 G A V Q WMN RLI A F A S R G N KVS P T H Y VP E 
4001 GGGGCAGTGC AGTGGATGAA CCGGCTGATA GCCTTCGCCT CCCGGGGGAA CCATGTTTCC CCCACGCACT ACGTGCCGGA 
CCCCGTCACG TCACCTACTT GGCCGACTAT CGGAAGCGGA GGGCCCCCTT GGTACAAAGG GGGTGCGTGA TGCACGGCCT 

+ 2 SDA AARV T A I LSS LTVT QLL RRL HOW 
4081 GAGCGAPGCA GCTGCCCGCG TCACTGCCAT ACTCAGCAGC CTCACTGTAA CCCAGCTCCT GAGGCGACTG CACCAGTGGA 
CTCGCTACGT CGACGGGCGC AGTGACGGTA TGAGTCGTCG GAGTGACATT GGGTCGAGGA CTCCGCTGAC GTGGTCACCT 



+2 ISSE CTT PCSG SWL RDI W D W I C E V LSD 
4161 TAAGCTCGGA GTGTACCACT CCATGCTCCG GTTCCTGGCT AAGGGACATC TGGGACTGGA TATGCGAGGT GTTGAGCGAC 
ATTCGAGCCT CACATGGTGA GGTACGAGGC CAAGGACCGA TTCCCTGTAG ACCCTGACCT ATACGCTCCA CAACTCGCTG 

LKA K L M PQLP G I P F V S CQRG YKG 

BamHI 

4241 TTTAAGACCT GGCTAAAAGC TAAGCTCATG CCACAGCTGC CTGGGATCCC CTTTGTGTCC TGCCAGCGCG GGTATAAGGG 
AAATTCTGGA CCGATTTTCG ATTCGAGTAC GGTGTCGACG GACCCTAGGG GAAACACAGG ACGGTCGCGC CCATATTCCC 

* 2 VWR GOGI MHT RCH CGAE I T G HVK NGT 
4321 GGTCTGGCGA GGGGACGGCA TCATGCACAC TCGCTGCCAC TGTGGAGCTG AGATCACTGG ACATGTCAAA AACGGGACGA 
CGAGACCGCT CCCCTGCCGT AGTACGTGTG AGCGACGGTG ACACCTCGAC TCTAGTGACC TGTACAGTTT TTCCCCTGCT 

♦2 M R I V GPR TCRN MWS GTF P I N A Y T T GPC 
4401 TGAGGATCGT CGGTCCTAGG ACCTGCAGGA ACATGTGGAG TGGGACCTTC CCCATTAATG CCTACACCAC GGGCCCCTGT 
ACTCCTAGCA GCCAGGATCC TGGACGTCCT TGTACACCTC ACCCTGGAAG GGGTAATTAC GGATGTGGTG CCCGGGGACA 

+2 T P L P A P N YTF ALMA V S A E E Y V E I R Q V G 
44 81 ACCCCCCTTC CTGCGCCGAA CTACACGTTC GCGCTATGGA GGGTGTCTGC AGAGGAATAC GTGGAGATAA GGCAGG7GGG 
TGGGGGGAAG GACGCGGCTT GATGTGCAAG CGCGA TACCT CCCACAGACG TCTCCTTATG CACCTCTATT CCGTCCACCC 

+ 2 DFH YVTG MTT DHL KCPC QVP SPE FFT 
4561 GGACTTCCAC TACGTGACGG GTATGACTAC TGACAATCTT AAATGCCCGT GCCAGGTCCC ATCGCCCGAA TTTTTCACAG 
CCTGAAGGTG ATGCACTGCC CATACTGATG ACTGTTAGAA TTTACGGGCA CGGTCCAGGG TAGCGGGCTT AAAAAGTGTC 

+2ELDG VRL HRFA PPC KPL LREE VSF RVG 
4641 AATTGCACGG GGTGCGCCTA CATAGGTTTG CGCCCCCC7G CAAGCCCTTG CTGCGGGAGG AGGTATCATT CAGAGTAGGA 
TTAACCTGCC CCACGCGGAT GTATCCAAAC GCGGGGGGAC GTTCGGGAAC GACGCCCTCC TCCATAGTAA GTCTCATCCt 

♦2LHEY PVG SQL PCEP EPD V A V LTSM LTD 
4721 CTCCACGAAT ACCCGGTAGG GTCGCAATTA CCTTGCGAGC CCGAACCGGA CGTGGCCGTG TTGACGTCCA TGCTCACTGA 
GAGGTGCTTA TGGGCCATCC CAGCGTTAAT GGAACGCTCG GGCTTGGCCT GCACCGGCAC AACTGCAGGT ACGAGTGACT 

+ 2 PSH ITAE A A G RRL ARGS PPS V A S SSA 
4801 TCCCTCCCAT ATAACAGCAG AGGCGGCCGG GCGAAGGTTG GCGAGGGGAT CACCCCCCTC TGTGGCCAGC TCCTCGGCTA 
AGGGAGGGTA TATTGTCGTC TCCGCCGGCC CGCTrCCAAC CGCTCCCCTA GTGGGGGGAG ACACCGGTCG AGGAGCCGAr 
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♦2SQLS APS L K A T C T A NHD S P D A ELr r n u 
4881 f^SSF™ 0 CGCTCCATCT CTCAAGGCAA CTTGCACCGC TAACCATGAC TCCCCTGATG CTGAGCTCAT AGAGGCCAAC 
CGGTCGATAG GCGAGGTAGA GAGTTCCGTT GAACGTGGCG ATTGGTACTG AGGGGACTAC GACTCGAGTA TCTCCGGTTG 



+2 LLWR- QEM GGN ITRV ESE N K V VILD <; r -, 
4 961 CTCCTATGGA GGCAGGAGAT CGGCGGCAAC ATCACCAGGG TTGAGTCAGA AAACAAAGTG GTGATTCTGG ACTCCTTCGA 
GAGGATACCT CCGTCCTCTA CCCGCCGTTG TAGTGGTCCC AACT CAGTCT TTTGTTTCAC CACTAAGACC ^gIggIIgCT 

+ 2 AEED ERE ISV P A E I LRK S R R F a n 

5041 SSSISIS GCGGAGGAGG ACGAGCGGGA GATCTCCGTA CCCGCAGAAA TCCTGCGGAA GTCTCGGAGA TTCGCCCAGG 
AGGCGAACAC CGCCTCCTCC TGCTCGCCCT CTAGAGGCAT GGGCGTCT7T AGGACGCCTT CAGAGCCTCT AAGCGGGTCC 



+ 2 A L P V W A R P D Y N PPL VET WKKP DYE P?v 
5121 CCCTGCCCGT TTGGGCGCGG CCGGACTATA ACCCCCCGCT AGTGGAGACG TGGAAAAAGC CCGACTACGA ACCACCTGTG 
GGGACGGGCA AACCCGCGCC GGCCTGATAT TGGGGGGCGA TCACCTCTGC ACCTTTTTCG GGCTGATGCT TGGTGGACAC 



+2VHGC PLP PPK SPPV P P P RKK RTVV LTE 
5201 S^^ TGGCT GCCCGCTTCC ACCTCCAAAG TCCCCTCCTG TGCCTCCGCC TCGGAAGAAG CGGACGGTGG TCCTCACTGA 
CAGGTACCGA CGGGCGAAGG TGGAGGTTTC AGGGGAGGAC ACGGAGGCGG AGCCTTCTTC GCCTGCCACC AGGAGTGACT 

e. B +2 STL STAL AEL A T R SFGS SST SGI TGO 
5281 ATCAACCCTA TCTACTGCCT TGGCCGAGCT CGCCACCAGA AGCTTTGGCA GCTCCTCAAC TTCCGGCATT ACGGGCGACA 
TAGTTGGGAT AGATGACGGA ACCGGCTCGA GCGGTGGTCT TCGAAACCGT CGAGGAGTTG AAGGCCGTAA TGCCCGCTGT 



♦ 2NTTT SSE PAPS GCP PDS OAES Y S S MPP 
5361 ATACGACAAC ATCCTCTGAG CCCGCCCCTT CTGGCTGCCC CCCCGACTCC GACGCTGAGT CCTATTCCTC CATGCCCCCC 
TATGCTGTTG TAGGAGACTC GGGCGGGGAA GACCGACGGG GGGGCTGAGG CTGCGACTCA GGATAAGGAG GTACGGGGGG 

+2 L E G E PGD PDL SOGS WST VSS EANA EDV 
BamHI 



5441 CTGGAGGGGG AGCCTGGGGA TCCGGATCTT AGCGACGGGT CATGGTCAAC GGTCAGTAGT GAGGCCAACG CGGAGGAT.GT 
GACCTCCCCC TCGGACCCCT AGGCCTAGAA TCGCTGCCCA GTACCAGTTG CCAGTCATCA CTCCGGTTGC GCCTCCTACA 

+2 VCC S M S Y SWT GAL VTPC A A E EQK LPI 
5521 CGTGTGCTGC TCAATGTCTT ACTCTTGGAC AGGCGCACTC GTCACCCCGT GCGCCGCGGA AGAACAGAAA CTGCCCATCA 
GCACACGACG AGTTACAGAA TGAGAACCTG TCCGCGTGAG CAGTGGGGCA CGCGGCGCCT TCTTGTCTTT GACGGGTAGT 



+2NALS NSL LRHH NLV YST TSRS ACQ RQX 
5601 ATGCACTAAG CAACTCGTTG CTACGTCACC ACAATTTGGT GTATTCCACC ACCTCACGCA GTGCTTGCCA AAGGCAGA^G 
TACGTGATTC GTTGAGCAAC GATGCAGTGG TGTTAAACCA CATAAGGTGG TGGAGTGCGT CACGAACGGT TTCCG7CTT'* 



♦2 K V T F ORL Q V I DSHY QDV LKE V K A A ASK 
5681 AAAGTCACAT TTCACAGACT GCAAGTTCTG GACAGCCATT ACCAGGACGT ACTCAAGGAG GTTAAAGCAG CGGCGTCAAA 
TTTCAGTGTA AACTGTCTGA CGTTCAAGAC CTGTCGGTAA TGGTCCTGCA TGAGTTCCTC CAATTTCGTC GCCGCAGTTT 



+2 VKA NLLS VEE ACS LTPP HSA KSK FGY 
5761 AGTGAAGGCT AACTTGCTAT CCGTAGAGGA AGCTTGCAGC CTGACGCCCC CACACTCAGC CAAATCCAAG TTTGGTTATG 
TCACTTCCGA TTGAACGATA GGCATCTCCT TCGAACGTCG GACTGCGGGG GTGTGAGTCG GTTTAGGTTC AAACCAATAC 



+2 G A K D VRC HARK A V T HIN SVWK DLL EDN 
5841 GGGCAAAAGA CGTCCGTTGC CATGCCAGAA AGGCCGTAAC CCACATCAAC rCCGTGTGGA AAGACCTTCT GGAAGACAAT 
CCCGTTTTCT GCAGGCAACG GTACGGTCTT TCCGGCATTG GGTGTAGTTG AGGCACACCT TTCTGGAAGA CCTTCTGTTA 



♦ 2 V T P I DTT IMA K N E V F C V QPE KGGR K ? A 
5921 GTAACACCAA TAGACACTAC CATCATGGCT AAGAACGAGG TTTTCTGCGT TCAGCCTGAG AAGGGGGGTC GTAAGCCAGC 
CATTGTGGTT ATCTGTGATG GTAGTACCGA TTCTTGCTCC AAAAGACGCA AGTCGGACTC TTCCCCCCAG CATTCGGTCG 
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+2 R L I VFPD L G V RVC £ K M A LYD V V T K L P 
6001 TCGTCTCATC GTGTTCCCCG ATCTGGGCGT GCGCGTGTGC GAAAAGATGG CTTTGTACGA CGTGGTTACA AAGCTCCCCT 
AGCAGAGTAG CACAAGGGGC TAGACCCGCA CGCGCACACG CTTTTCTACC GAAACATGCT GCACCAATGT TTCGAGGGGA 

+ 2LAVM.GSS YGFQ Y S P GQR V E F L V Q A WKS 

EcoRI 



6081 TGGCCGTGAT GGGAAGCTCC TACGGATTCC AATACTCACC AGGACAGCGG GTTGAATTCC TCGTGCAAGC GTCCAAGTCC 
ACCGGCACTA CCCTTCGAGG ATGCCTAAGG TTATGAGTGG TCCTGTCGCC CAACTTAAGG AGCACGTTCG CACCTTCAGG 



+ 2KKTP MGF SYD TRCF DST VTE SDIR TEE 
6161 AAGAAAACCC CAATGGGGTT CTCGTATGAT ACCCGCTGCT TTGACTCCAC AGTCACTGAG AGCGACATCC GTACGGAGGA 
TTCTTTTGGG GTTACCCCAA GAGCATACTA TGGGCGACGA AACTGAGGTG TCAGTGACTC TCGCTGTAGG CATGCCTCCT 



♦2 A I Y QCCD LDP Q A R V A I K SLT ERL Y V G 
6241 GGCAATCTAC CAATGTTGTG ACCTCGACCC CCAAGCCCGC GTGGCCATCA AGTCCCTCAC CGAGACGCTT TATGTTGGGG 
CCGTTAGATG GTTACAACAC TGGAGCTGGG GGTTCGGGCG CACCGGTAGT TCAGGGAGTG GCTCTCCGAA ATACAACCCC 



+2GPLT NSR GENC GYR RCR A S G V LTT SCG 
6321 GCCCTCTTAC CAATTCAAGG GGGGAGAACT GCGGCTATCG CAGGTGCCGC GCGAGCGGCG TACTGACAAC TAGCTGTGGT 
CGGGAGAATG GTTAAGTTCC CCCCTCTTGA CGCCGATAGC GTCCACGGCG CGCTCGCCGC ATGACTGTTG ATCGACACCA 



+2NTLT CYI KAR AACR A A G LQD CTML VCG 
6401 AACACCCTCA CTTGCTACAT CAAGGCCCGG GCAGCCTGTC GAGCCGCAGG GCTCCAGGAC TGCACCATGC TCCTGTGTGG 
TTGTGGGAGT GAACGATGTA GTTCCGGGCC CGTCGGACAG CTCGGCGTCC CGAGGTCCTG ACGTGGTACG AGCACACACC 



*2 DDL VVIC ESA GVQ EDAA SLR AFT E A M 
64 81 CGACGACTTA GTCGTTATCT GTGAAAGCGC GGGGGTCCAG GAGGACGCGG CGAGCCTGAG AGCCTTCACG GAGGCTATGA 
GCTGCTGAAT CAGCAATAGA CACTTTCGCG CCCCCAGGTC CTCCTGCGCC GCTCGGACTC TCGGAAGTGC CTCCGATACT 



f2TRYS APP GDPP Q P S YDL ELIT SCS SNV 
6561 CCAGGTACTC CGCCCCCCCT GGGGACCCCC CACAACCAGA ATACGACTTG GAGCTCATAA CATCATGCTC CTCCAACGTG 
GGTCCATGAG GCGGGGGGGA CCCCTGGGGG GTGTTGGTCT TATGCTGAAC CTCGAGTATT GTAGTACGAG GAGCTTGCAC 



♦2 S V A H DGA GKR VYYL TRD PTT PLAR A A W 
6641 TCAGTCGCCC ACGACGGCGC TGGAAAGAGG GTCTACTACC TCACCCGTGA CCCTACAACC CCCCTCGCGA GAGCTGCGTG 
AGTCAGCGGG TGCTGCCGCG ACCTTTCTCC CAGATGATGG AGTGGGCACT GGGATGTTGG GGGGAGCGCT CTCGACGCAC 



+2 ETA RHTP VNS WLG NIIM FAP TLW ARM 
6721 GGAGACAGCA AGACACACTC CAGTCAATTC CTGGCTAGGC AACATAATCA TGTTTGCCCC CACACTGTGG GCGAGGATGA 
CCTCTGTCGT TCTGTGTGAG GTCAGTTAAG GACCGATCCG TTGTATTAGT ACAAACGGGG GTGTGACACC CGCTCCTACT 



+ 2ILMT HFF S V L I. A R D QLE QALD CEI YGA 
6801 TACTGATGAC CCATTTCTTT AGCGTCCTTA TAGCCAGGGA CCAGCTTGAA CAGGCCCTCG ATTGCGAGAT CTACGGGGCC 
ATGACTACTG GGTAAAGAAA TCGCAGGAAT ATCGGTCCCT GGTCGAACTT GTCCGGGAGC TAACGCTCTA GATGCCCCGG 



+2CYSI EPL D L P P IIQ RLH GLS A F S L HSY 
6881 TGCTACTCCA TAGAACCACT GGATCTACCT CCAATCATTC AAAGACTCCA TGGCCTCAGC GCATTTTCAC TCCACAGTTA 
ACGATGAGGT ATCTTGGTGA CCTAGATGGA GGTTAGTAAG TTTCTGAGGT ACCGGAGTCG CGTAAAAGTG AGGTGTCAAT 



+ 2 SPG E I M R VAA CLR KLGV PPL RAW RHR 
6961 CTCTCCAGGT GAAATCAATA GGGTGGCCGC ATGCCTCAGA AAACTTGGGG TACCGCCCTT GCGAGCTTGG AGACACCGGG 
GAGAGGTCCA CTTTAGTTAT CCCACCGGCG TACGGAGTCT TTTGAACCCC ATGGCGGGAA CGCTCGAACC TCTGTGGCCC 



+ 2ARSV RAR LLAR GGR A A I CGKY L F N WAV 
7 041 CCCGGAGCGT CCGCGCTAGG CTTCTGGCCA GAGGAGGCAG GGCTGCCATA TGTGGCAAGT ACCTCTTCAA CTGGGCAGTA 
GGGCCTCGCA GGCGCGATCC GAAGACCGGT CTCCTCCGTC CCGACGGTAT ACACCGTTCA TGGAGAAGTT GACCCGTCAT 
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+ 2RTKL KLT PIA A A G Q LOL S G W *F T A G ¥ S G 
7121 AGAACAAAGC TCAAACTCAC TCCAATAGCG GCCGCTGGCC AGCTGGACTT GTCCGGCTGG TTCACGGCTG GCTACAGCGG 
TCTTGTTTCG AGTTTGAGTG AGGTTATCGC CGGCGACCGG TCGACCTGAA CAGGCCGACC AAGTGCCGAC CGATGTCGCC 



♦ 2 GDI Y-HSV SHA RPR W I W F CLL L L A AGV 
7201 GGGAGACATT TATCACAGCG TGTCTCATGC CCGGCCCCGC TGGATCTGGT TTTGCCTACT CCTGCTTGCT GCAGGGGTAG 
CCCTCTGTAA ATAGTGTCGC ACAGAGTACG GGCCGGGGCG ACCTAGACCA AAACGGATGA GGACGAACGA CGTCCCCATC 



♦2 G I Y L LPN R 
7281 GCATCTACCT CCTCCCCAAC CGATGAAGGT TGGGGTAAAC ACTCCGGCCT AAAAAAAAAA AAAAATCTAG AAAGGCGCGC 
CGTAGATGGA GGAGGGGTTG GCTACTTCCA ACCCCATTTG TGAGGCCGGA TTTTTTTTTT TTTTTAGATC TTTCCGCGCG 



BamHI Mlul 



7361 CAAGATATCA AGGATCCACT ACGCGTTAGA GCTCGCTGAT CAGCCTCGAC TGTGCCTTCT AGTTGCCAGC CATCTGT7GT 
GTTCTATAGT TCCTAGGTGA TGCGCAATCT CGAGCGACTA GTCGGAGCTG ACACGGAAGA TCAACGGTCG GTAGACAACA 



7 4 41 TTGCCCCTCC CCCGTGCCTT CCTTGACCCT GGAAGGTGCC ACTCCCACTG TCCTTTCCTA ATAAAATGAG GAAATTGCAT 
AACGGGGAGG GGGCACGGAA GGAACTGGGA CCTTCCACGG TGAGGGTGAC AGGAAAGGAT TATTTTACTC CTTTAACGTA 



7521 CGCATTGTCr GAGTAGGTGr CATTCTATTC 7GGGGGGTGG GGTGGGGCAG GACAGCAAGG GGGAGGATTG GGAAGACAAT 
GCGTAACAGA CTCATCCACA GTAAGATAAG ACCCCCCACC CCACCCCGTC CTGTCGTTCC CCCTCCTAAC CCTTCTGTTA 



7 601 AGCAGGCATG CTGGGGAGCT CTTCCGCTTC CTCGCTCACT GACTCGCTGC GCTCGGTCGT TCGGCTGCGG CGAGCGGTAT 
TCGTCCGTAC GACCCCTCGA GAAGGCGAAG GAGCGAGTGA CTGAGCGACG CGAGCCAGCA AGCCGACGCC GCTCGCCATA 



7 681 CAGCTCACTC AAAGGCGGTA ATACGGTTAT CCACAGAATC AGGGGATAAC GCAGGAAAGA ACATGTGAGC AAAAGGCCAG 
GTCGAGTGAG TTTCCGCCAT TATGCCAATA GGT.GTCTTAG TCCCCTATTG CGTCCTTTCT TGTACACTCG TTTTCCGGTC 



7761 CAAAAGGCCA GGAACCGTAA AAAGGCCGCG TTGCTGGCGT TTTTCCATAG GCTCCGCCCC CCTGACGAGC ATCACAAAAA 
GTTTTCCGGT CCTTGGCATT TTTCCGGCGC AACGACCGCA AAAAGGTATC CGAGGCGGGG GGACTGCTCG TAGTGTTTTT 



7841 TCGACGCTCA AGTCAGAGGT GGCGAAACCC GACAGGACTA TAAAGATACC AGGCGTTTCC CCCTGGAAGC TCCCTCGTGC 
AGCTGCGAGT TCAGTCTCCA CCGCTTTGGG CTGTCCTGAT ATTTCTATGG TCCGCAAAGG GGGACCTTCG AGGGAGCACG 



7 921 GCTCTCCTGT TCCGACCCTG CCGCTTACCG GATACCTGTC CGCCTTTCTC CCTTCGGGAA GCGTGGCGCT TTCTCAATGC 
CGAGAGGACA AGGCTGGGAC GGCGAATGGC CTATGGACAG GCGGAAAGAG GGAAGCCCTT CGCACCGCGA AAGAGTTACG 



8001 TCACGCTGTA GGTATCTCAG TTCGGTGTAG GTCGTTCGCT CCAAGCTGGG CIGTGTGCAC GAACCCCCCG TTCAGCCCGA 
AGTGCGACAT CCATAGAGTC AAGCCACATC CAGCAAGCGA GGTTCGACCC GACACACGTG CTTGGGGGGC AAGTCGGGCT 



8081 CCGCTGCGCC TTATCCGGTA ACTATCGTCT TGAGTCCAAC CCGGTAAGAC ACGACTTATC GCCACTGGCA GCAGCCACTG 
GGCGACGCGG AATAGGCCAT TGATAGCAGA ACTCAGGTTG GGCCATTCTG TGCTGAATAG CGGTGACCGT CGTCGGTGAC 



8161 GTAACAGGAT TAGCAGAGCG AGGTATGTAG GCGGTGCTAC AGAGTTCTTG AAGTGGTGGC CTAACTACGG CTACACTAGA 
CATTGTCCTA ATCGTCTCGC TCCATACATC CGCCACGATG TCTCAAGAAC TTCACCACCG GATTGATGCC GATGTGATCT 



8241 AGGACAGTAT TTGGTATCTG CGCTCTGCTG AAGCCAGTTA CCTTCGGAAA AAGAGTTGGT AGCTCTTGAT CCGGCAAACA 
TCCTGTCATA AACCATAGAC GCGAGACGAC TTCGGTCAAT GGAAGCCTTT TTCTCAACCA TCGAGAACTA GGCCGTTTGT 



8321 AACCACCGCT GGTAGCGGTG GTTTTTTTGT TTGCAAGCAG CAGATTACGC GCAGAAAAAA AGGATCTCAA GAAGATCCTT 
TTGGTGGCGA CCATCGCCAC CAAAAAAACA AACGT7CGTC GTCTAATGCG CGTCTTTTTT TCCTAGAGTT CTTCTAGGAA 



8401 TGATCTTTTC TACGGGGTCT GACGCTCAGT GGAACGAAAA CTCACGTTAA GGGATTTTGG TCATGAGATT ATCAAAAAGG 
ACTAGAAAAG ATGCCCCAGA CTGCGAGTCA CCTTGCTTTT GAGTGCAATT CCCTAAAACC AGTACTCTAA TAGTTTTTCC 
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8481 


ATCTTCACCT AGATCCTTTT AAATTAAAAA TGAAGTTTTA AATCAATCTA AAGTATATAT GAGTAAACTT GGTCTGACAG 
TAGAAGTGGA TCTAGGAAAA TTTAATTTTT ACTTCAAAAT TTAGTTAGAT TTCATATATA CTCATTTGAA CCAGACTGTC 


8561 


TTACCAATGC TTAATCAGTG AGGCACCTAT CTCAGCGATC TGTCTATTTC GTTCATCCAT AGTTGCCTGA CTCCCCGTCG 
AATGGTTACG AATTAGTCAC TCCGTGGATA GAGTCGCTAG ACAGATAAAG CAAGTAGGTA TCAACGGACT GAGGGGCAGC 


8641 


TGTAGATAAC TACGATACGG GAGGGCTTAC CATCTGGCCC CAGTGCTGCA ATGATACCGC GAGACCCACG CTCACCGGCT 
ACATCTATTG ATGCTATGCC CTCCCGAATG GTAGACCGGG GTCACGACGT TACTATGGCG CTCTGGGTGC GAGTGGCCGA 


8721 


CCAGATTTAT CAGCAATAAA CCAGCCAGCC GGAAGGGCCG AGCGCAGAAG TGGTCCTGCA ACTTTATCCG CCTCCATCCA 
GGTCTAAATA GTCGTTATTT GGTCGGTCGG CCTTCCCGGC TCGCGTCTTC ACCAGGACGT TGAAATAGGC GGAGGTAGGT 


8801 


GTCTATTAAT TGTTGCCGGG AAGCTAGAGT AAGTAGTTCG CCAGTTAATA GTTTGCGCAA CGTTGTTGCC ATTGCTACAG 
CAGATAATTA ACAACGGCCC TTCGATCTCA TTCATCAAGC GGTCAATTAT CAAACGCGTT GCAACAACGG TAACGATGTC 


8881 


GCATCGTGGT GTCACGCTCG TCGTTTGGTA TGGCTTCATT CAGCTCCGGT TCCCAACGAT CAAGGCGAGT TACATGATCC 
CGTAGCACCA CAGTGCGAGC AGCAAACCAT ACCGAAGTAA GTCGAGGCCA AGGGTTGCTA GTTCCGCTCA ATGTACTAGG 


8961 


CCCATGTTGT GCAAAAAAGC GGTTAGCTCC TTCGGTCCTC CGATCGTTGT CAGAAGTAAG TTGGCCGCAG TGTTATCACT 
GGGTACAACA CGTTTTTTCG CCAATCGAGG AAGCCAGGAG GCTAGCAACA GTCTTCATTC AACCGGCGTC ACAATAGTGA 


9041 


CATGGTTATG GCAGCACTGC ATAATTCTCT TACTGTCATG CCATCCGTAA GATGCTTTTC TGTGACTGGT GAGTACT CAA 
GTACCAATAC CGTCGTGACG TATTAAGAGA ATGACAGTAC GGTAGGCATT CTACCAAAAG ACACTGACCA CTCATGAGTT 


9121 


CCAAGTCATT CTGAGAATAG TGTATGCGGC GACCGAGTTG CTCTTGCCCG GCGTCAATAC GGGATAATAC CGCGCCACAT 
GGTTCAGTAA GACTCTTATC ACATACGCCG CTGGCTCAAC GAGAACGGGC CGCAGTTATG CCCTATTATG GCGCGGTGTA 


9201 


AGCAGAACTT TAAAAGTGCT CATCATTGGA AAACGTTCTT CGGGGCGAAA ACTCTCAAGG ATCTTACCGC TGTTGAGATC 
TCGTCTTGAA ATTTTCACGA GTAGTAACCT TTTGCAAGAA GCCCCGCTTT TGAGAGTTCC TAGAATGGCG ACAACTCTAG 


9281 


CAGTTCGATG TAACCCACTC GTGCACCCAA CTGATCTTCA GCATCTTTTA CTTTCACCAG CGTTTCTGGG TGAGCAAAAA 
GTCAAGCTAC ATTGGGTGAG CACGTGGGTT GACTAGAAGT CGTAGAAAAT GAAAGTGGTC GCAAAGACCC ACTCGTTTTT 


9361 


CAGGAAGGCA AAATGCCGCA AAAAAGGGAA TAAGGGCGAC ACGGAAATGT TGAATACTCA TACTCTTCCT TTTTCAATAT 
GTCCTTCCGT TTTACGGCGT TTTTTCCCTT ATTCCCGCTG TGCCTTTACA ACTTATGAGT ATGAGAAGGA AAAAGTTATA 


9441 


TATTGAAGCA TTTATCAGGG TTATTGTCTC ATGAGCGGAT ACATATTTGA ATGTATTTAG AAAAATAAAC AAATAGGGGT 
ATAACTTCGT AAATAGTCCC AATAACAGAG TACTCGCCTA TGTATAAACT TACATAAATC TTTTTATTTG TTTATCCCCA 


9521 


TCCGCGCACA TTTCCCCGAA AAGTGCCACC TGACGTCTAA GAAACCATTA T TAT CAT G AC ATTAACCTAT AAAAATAGGC 
AGGCGCGTGT AAAGGGGCTT TTCACGGTGG ACTGCAGATT CTTTGGTAAT AATAGTACTG TAATTGGATA TTTTTATCCG 


9601 


GTATCACGAG GCCCTTTCGT C 
CATAGTGCTC CGGGAAAGCA G 
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1 


TCGCGCGTTT CGGTGATGAC GGTGAAAACC TCTGACACAT GCAGCTCCCG GAGACGGTCA CAGCTTGTCT GTAAGCGGAT 
AGCGCGCAAA GCCACTACTG CCACTTTTGG AGACTGTGTA CGTCGAGGGC CTCTGCCAGT GTCGAACAGA CATTCGCCTA 


81 


GCCGGGAGCA GACAAGCCCG TCAGGGCGCG TCAGCGGGTG TTGGCGGGTG TCGGGGCTGG CTTAACTATG CGGCATCAGA 
CGGCCCTCGT CTGTTCGGGC AGTCCCGCGC AGTCGCCCAC AACCGCCCAC AGCCCCGACC GAATTGATAC GCCGTAGTCT 


161 


GCAGATTGTA CTGAGAGTGC ACCATATGAA GCTTTTTGCA AAAGCCTAGG CCTCCAAAAA AGCCTCCTCA CTACTTCTGG 
CGTCTAACAT GACTCTCACG TGGTATACTT CGAAAAACGT TTTCGGATCC GGAGGTTTTT TCGGAGGAGT GATGAAGACC 


241 


AATAGCTCAG AGGCCGAGGC GGCCTCGGCC TCTGCATAAA TAAAAAAAAT TAGTCAGCCA TGGGGCGGAG AATGGGCGGA 
TTATCGAGTC TCCGGCTCCG CCGGAGCCGG AGACGTATTT ATTTTTTTTA ATCAGTCGGT ACCCCGCCTC TTACCCGCCT 


321 


ACTGGGCGGG QAGGGAATTA TTGGCTATTG GCCATTGCAT ACGTTGTATC TATATCATAA TATGTACATT TATATTGGCT 
TGACCCGCCC CTCCCTTAAT AACCGATAAC CGGTAACGTA TGCAACATAG ATATAGTATT ATACATGTAA ATATAACCGA 


401 


CATGTCCAAT ATGACCGCCA TGTTGACATT GATTATTGAC TAGTTATTAA TAGTAATCAA TTACGGGGTC ATTAGTTCAT 
GTACAGGTTA TACTGGCGGT ACAACTGTAA CTAATAACTG ATCAATAATT ATCATTAGTT AATGCCCCAG TAATCAAGTA 


481 


AGCCCATATA TGGAGTTCCG CGTTACATAA CTTACGGTAA ATGGCCCGCC TGGCTGACCG CCCAACGACC CCCGCCCATT 
TCGGGTATAT ACCTCAAGGC GCAATGTATT GAATGCCATT TACCGGGCGG ACCGACTGGC GGGTTGCTGG GGGCGGGTAA 


561 


GACGTCAATA ATGACGTATG TTCCCATAGT AACGCCAATA GGGACTTTCC ATTGACGTCA ATGGGTGGAG TATTTACGGT 
CTGCAGTTAT TACTGCATAC AAGGGTATCA TTGCGGTTAT CCCTGAAAGG TAACTGCAGT TACCCACCTC ATAAATGCCA 


641 


AAACTGCCCA CTTGGCAGTA CATCAAGTGT ATCATATGCC AAGTCCGCCC CCTATTGACG TCAATGACGG TAAATGGCCC 
TTTGACGGGT GAACCGTCAT GTAGTTCACA TAGTATACGG TTCAGGCGGG GGATAACTGC AGTTACTGCC ATTTACCGGG 


721 


GCCTGGCATT ATGCCCAGTA CATGACCTTA CGGGACTTTC CTACTTGGCA GTACATCTAC GTATTAGTCA TCGCTATTAC 
CGGACCGTAA TACGGGTCAT GTACTGGAAT GCCCTGAAAG GATGAACCGT CATGTAGATG CATAATCAGT AGCGATAATG 


801 


CATGGTGATG CGGTTTTGGC AGTACACCAA TGGGCGTGGA TAGCGGTTTG ACTCACGGGG ATTTCCAAGT CTCCACCCCA 
GTACCACTAC GCCAAAACCG TCATGTGGTT ACCCGCACCT ATCGCCAAAC TGAGTGCCCC TAAAGGTTCA GAGGTGGGGT 


881 


TTGACGTCAA TGGGAGTTTG TTTTGGCACC AAAATCAACG GGACTTTCCA AAATGTCGTA ATAACCCCGC CCCGTTGACG 
AACTGCAGTT ACCCTCAAAC AAAACCGTGG TTTTAGTTGC CCTGAAAGGT TTTACAGCAT TATTGGGGCG GGGCAACTGC 


961 


CAAATGGGCG GTAGGCGTGT ACGGTGGGAG GTCTATATAA GCAGAGCTCG TTTAGTGAAC CGTCAGATCG CCTGGAGACG 
GTTTACCCGC CATCCGCACA TGCCACCCTC CAGATATATT CGTCTCGAGC AAATCACTTG GCAGTCTAGC GGACCTCTGC 


1041 


CCATCCACGC TGTTTTGACC TCCATAGAAG ACACCGGGAC CGATCCAGCC TCCGCGGCCG GGAACGGTGC ATTGGAACGC 
GGTAGGTGCG ACAAAACTGG AGGTATCTTC TGTGGCCCTG GCTAGGTCGG AGGCGCCGGC CCTTGCCACG TAACCTTGCG 


1121 


GGATTCCCCG TGCCAAGAGT GACGTAAGTA CCGCCTATAG ACTCTATAGG CACACCCCTT TGGCTCTTAT GCATGCTATA 
CCTAAGGGGC ACGGTTCICA CTGCATTCAT GGCGGATATC TGAGATATCC GTGTGGGGAA ACCGAGAATA CGTACGATAT 


1201 


CTGTTTTTGG CTTGGGGCCT ATACACCCCC GCTCCTTATG CTATAGGTGA TGGTATAGCT TAGCCTATAG GTGTGGGTTA 
GACAAAAACC GAACCCCGGA TATGTGGGGG CGAGGAATAC GATATCCACT ACCATATCGA ATCGGATATC CACACCCAAT 


1281 


TTGACCATTA TTGACCACTC CCCTATTGGT GACGATACTT TCCATTACTA ATCCATAACA TGGCTCTTTG CCACAACTAT 
AACTGGTAAT AACTGGTGAG GGGATAACCA CTGCTATGAA AGGTAATGAT TAGGTATTGT ACCGAGAAAC GGTGTTGATA 


1361 


CTCTATTGGC TATATGCCAA TACTCTGTCC TTCAGAGACT GACACGGACT CTGTATTTTT ACAGGATGGG GTCCATTTAT 
GAGATAACCG ATATACGGTT ATGAGACAGG AAGTCTCTGA CTGTGCCTGA GACATAAAAA TGTCCTACCC CAGGTAAATA 


1441 


TATTTACAAA TTCACATATA CAACAACGCC GTCCCCCGTG CCCGCAGTTT TTATTAAACA TAGCGTGGGA JCTCCGACAT 
ATAAATGTTT AAGTGTATAT GTTGTTGCGG CAGGGGGCAC GGGCGTCAAA AATAATTTGT ATCGCACCCT AGAGGCTGTA 
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1521 


CTCGGGTACG TGTTCCGGAC ATGGGCTCTT CTCCGGTAGC GGCGGAGCTT CCACATCCGA GCCCTGGTCC CATCCGTCCA 
GAGCCCATGC ACAAGGCCTG TACCCGAGAA GAGGCCATCG CCGCCTCGAA GGTGTAGGCT CGGGACCAGG GTAGGCAGGT 


1601 


GCGGCTCATG GTCGCTCGGC AGCTCCTTGC TCCTAACAGT GGAGGCCAGA CTTAGGCACA GCACAATGCC CACCACCACC 
CGCCGAGTAC CAGCGAGCCG TCGAGGAACG AGGATTGTCA CCTCCGGTCT GAATCCGTGT CGTGTTACGG GTGGTGGTGG 


1681 


AGTGTGCCGC ACAAGGCCGT GGCGGTAGGG TATGTGTCTG AAAATGAGCT CGGAGATTGG GCTCGCACCT GGACGCAGAT 
TCACACGGCG TGTTCCGGCA CCGCCATCCC ATACACAGAC TTTTACTCGA GCCTCTAACC CGAGCGTGGA CCTGCGTCTA 


1761 


GGAAGACTTA AGGCAGCGGC AGAAGAAGAT GCAGGCAGCT GAGTTGTTGT ATTCTGATAA GAGTCAGAGG TAACTCCCGT 
CCTTCTGAAT TCCGTCGCCG TCTTCTTCTA CGTCCGTCGA CTCAACAACA TAAGACTATT CTCAGTCTCC ATTGAGGGCA 


1841 


TGCGGTGCTG TTAACGGTGG AGGGCAGTGT AGTCTGAGCA GTACTCGTTG CTGCCGCGCG CGCCACCAGA CATAATAGCT 
ACGCCACGAC AATTGCCACC TCCCGTCACA TCAGACTCGT CATGAGCAAC GACGGCGCGC GCGGTGGTCT GTATTATCGA 




EcoRI 


1921 


GACAGACTAA CAGACTGTTC CTTTCCATGG GTCTTTTCTG CAGTCACCGT CGTCGACCTA AGAATTCAGA CTCGAGCAAG 
CTGTCTGATT GTCTGACAAG GAAAGGTACC CAGAAAAGAC GTCAGTGGCA GCAGCTGGAT TCTTAAGTCT GAGCTCGTTC 




Xbal BamHI Mlul 


2001 


TCTAGAAAGG CGCGCCAAGA TATCAAGGAT CCACTACGCG TTAGAGCTCG CTGATCAGCC TCGACTGTGC CTTCTAGTTG 
AGArCTTTCC GCGCGGTTCT ATAGTTCCTA GGTGATGCGC AATCTCGAGC GACTAGTCGG AGCTGACACG GAAGATCAAC 


2081 


CCAGCCATCT GTTGTTTGCC CCTCCCCCGT GCCTTCCTTG ACCCTGGAAG GTGCCACTCC CACTGTCCTT TCCTAATAAA 
GGTCGGTAGA CAACAAACGG GGAGGGGGCA CGGAAGGAAC TGGGACCTTC CACGGTGAGG GTGACAGGAA AGGATTATTT 


2161 


ATGAGGAAAT TGCATCGCAT TGTCTGAGTA GGTGTCATTC TATTCTGGGG GGTGGGGTGG GGCAGGACAG CAAGGGGGAG 
TACTCCTTTA ACGTAGCGTA ACAGACTCAT CCACAGTAAG ATAAGACCCC CCACCCCACC CCGTCCTGTC GTTCCCCCTC 


2241 


GATTGGGAAG ACAATAGCAG GCATGCTGGG GAGCTCTTCC GCTTCCTCGC TCACTGACTC GCTGCGCTCG GTCGTTCGGC 
CTAACCCTTC TGTTATCGTC CGTACGACCC CTCGAGAAGG CGAAGGAGCG AGTGACTGAG CGACGCGAGC CAGCAAGCCG 


2321 


TGCGGCGAGC GGTATCAGCT CACTCAAAGG CGGTAATACG GTTATCCACA GAATCAGGGG ATAACGCAGG AAAGAACATG 
ACGCCGCTCG CCATAGTCGA GTGAGTTTCC GCCATTATGC CAATAGGTGT CTTAGTCCCC TATTGCGTCC TTTCTTGTAC 


2401 


TGAGCAAAAG GCCAGCAAAA GGCCAGGAAC CGTAAAAAGG CCGCGTTGCT GGCGTTTTTC CATAGGCTCC GCCCCCCTGA 
ACTCGTTTTC CGGTCGTTTT CCGGTCCTTG GCATTTTTCC GGCGCAACGA CCGCAAAAAG GTATCCGAGG CGGGGGGACT 


2481 


CGAGCATCAC AAAAATCGAC GCTCAAGTCA GAGGTGGCGA AACCCGACAG GACTATAAAG ATACCAGGCG TTTCCCCCTG 
GCTCGTAGTG TTTTTAGCTG CGAGTTCAGT CTCCACCGCT TTGGGCTGTC CTGATATTTC TATGGTCCGC AAAGGGGGAC 




r*ncrrrrcT rnrGCGCTCT CCTGTTCCGA CCCTGCCGCT TACCGGATAC CTGTCCGCCT TTCTCCCTTC GGGAAGCGTG 
CTTCGAGGGA GCACGCGAGA GGACAAGGCT GGGACGGCGA ATGGCCTATG GACAGGCGGA AAGAGGGAAG CCCTTCGCAC 


2641 


GCGCTTTCTC AATGCTCACG CTGTAGGTAT CTCAGTTCGG TGTAGGTCGT TCGCTCCAAG CTGGGCTGTG TGCACGAACC 
CGCGAAAGAG TTACGAGTGC GACATCCATA GAGTCAAGCC ACATCCAGCA AGCGAGGTTC GACCCGACAC ACGTGCTTGG 


2721 


CCCCGTTCAG CCCGACCGCT CCGCCTTATC CGGTAACTAT CGTCTTGAGT CCAACCCGGT AAGACACGAC TTATCGCCAC 
GGGGCAAGTC GGGCTGGCGA CGCGGAATAG GCCATTGATA GCAGAACTCA GGTTGGGCCA TTCTGTGCTG AATAGCGGTG 


2801 


TGGCAGCAGC CACTGGTAAC AGGATTAGCA GAGCGAGGTA TGTAGGCGGT GCTACAGAGT TCTTGAAGTG GTGGCCTAAC 
ACCGTCGTCG GTGACCATTG TCCTAATCGT CTCGCTCCAT ACATCCGCCA CGATGTCTCA AGAACTTCAC CACCGGATTG 


2881 


TACGGCTACA CTAGAAGGAC AGTATTTGGT ATCTGCGCTC TGCTGAAGCC AGTTACCTTC GGAAAAAGAG TTGGTAGCTC 
ATGCCGATGT GATCTTCCTG TCATAAACCA TAGACGCGAG ACGACTTCGG TCAATGGAAG CCTTTTTCTC AACCATCGAG 
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2961 TTGATCCGGC AAACAAACCA CCGCTGGTAG CGGTGGTTTT TTTGTTTGCA AGCAGCAGAT TACGCGCAGA AAAAAAGGAT 
AACTAGGCCG TTTGTTTGGT GGCGACCATC GCCACCAAAA AAACAAACGT TCGTCGTCTA ATGCGCGTCT TTTTTTCCTA 



3041 CTCAAGAAGA TCCTTTGATC TTTTCTACGG GGTCTGACGC TCAGTGGAAC GAAAACTCAC GTTAAGGGAT TTTGGTCATG 
GAGTTCTTCT AGGAAACTAG AAAAGATGCC CCAGACTGCG AGTCACCTTG CTTTTGAGTG CAATTCCCTA AAACCAGTAC 



3121 AGATTATCAA AAAGGATCTT CACCTAGATC CTTTTAAATT AAAAATGAAG TTTTAAATCA ATCTAAAGTA TATATGAGTA 
TCTAATAGTT TTTCCTAGAA GTGGATCTAG GAAAATTTAA TTTTTACTTC AAAATTTAGT TAGATTTCAT ATATACTCAT 



3201 AACTTGGTCT GACAGTTACC AATGCTTAAT CAGTGAGGCA CCTATCTCAG CGATCTGTCT ATTTCGTTCA TCCATAGTTG 
TTGAACCAGA CTGTCAATGG TTACGAATTA GTCACTCCGT GGATAGAGTC GC TAG AC AG A TAAAGCAAGT AGGTATCAAC 



3281 CCTGACTCCC CGTCGTGTAG ATAACTACGA TACGGGAGGG CTTACCATCT GGGCCCAGTG CTGCAATGAT ACCGCGAGAC 
GGACTGAGGG GCAGCACATC TATTGATGCT ATGCCCTCCC GAATGGTAGA CCGGGGTCAC CACGTTACTA TGGCGCTCTG 



3361 CCACGCTCAC CGGCTCCAGA TTTATCAGCA ATAAACCAGC CAGCCGGAAG GGCCGAGCGC AGAAGTGGTC CTGCAACTTT 
GGTGCGAGTG GCCGAGGTCT AAATAGTCGT TATTTGGTCG GTCGGCCTTC CCGGCTCGCG TCTTCACCAG GACGTTGAAA 



3441 ATCCGCCTCC ATCCAGTCTA TTAATTGTTG CCGGGAAGCT AGAGTAAGTA GTTCGCCAGT TAATAGTTTG CGCAACGTTG 
TAGGCGGAGG TAGGTCAGAT AATTAACAAC GGCCCTTCGA TCTCATTCAT CAAGCGGTCA ATTATCAAAC GCGTTGCAAC 



3521 TTGCCATTGC TACAGGCATC GTGGTGTCAC GCTCGTCGTT TGGTATGGCT TCATTCAGCT CCGGTTCCCA ACGATCAAGG 
AACGGTAACG ATGTCCGTAG CACCACAGTG CGAGCAGCAA ACCATACCGA AGTAAGTCGA GGCCAAGGGT TGCTAGTTCC 



3601 CGAGTTACAT GATCCCCCAT GTTGTGCAAA AAAGCGGTTA GCTCCTTCGG TCCTCCGATC GTTGTCAGAA GTAAGTTGGC 
GCTCAATGTA CTAGGGGGTA CAACACGTTT TTTCGCCAAT CGAGGAAGCC AGGAGGCTAG CAACAGTCTT CATTCAACCG 



3661 CGCAGTGTTA TCACTCATGG TTATGGCAGC ACTGCATAAT TCTCTTACTG TCATGCCATC CGTAAGATGC TTTTCTGTGA 
GCGTCACAAT AGTGAGTACC AATACCGTCG TGACGTATTA AGAGAATGAC AGTACGGTAG GCATTCTACG AAAAGACACT 



37 61 CTGGTGAGTA CTCAACCAAG TCATTCTGAG AATAGTGTAT GCGGCGACCG AGTTGCTCTT GCCCGGCGTC AATACGGGAT 
GACCACTCAT GAGTTGGTTC AGTAAGACTC TTATCACATA CGCCGCTGGC TCAACGAGAA CGGGCCGCAG TTATGCCCTA 



3841 AATACCGCGC CACATAGCAG AACTTTAAAA GTGCTCATCA TTGGAAAACG TTCTTCGGGG CGAAAACTCT CAAGGATCTT 
TTATGGCGCG GTGTATCGTC TTGAAATTTT CACGAGTAGT AACCTTTTGC AAGAAGCCCC GCTTTTGAGA GTTCCTAGAA 



3921 ACCGCTGTTG AGATCCAGTT CGATGTAACC CACTCGTGCA CCCAACTGAT CTTCAGCATC TTTTACTTTC ACCAGCGTTT 
TGGCGACAAC TCTAGGTCAA GCTACATTGG GTGAGCACGT GGGTTGACTA GAAGTCGTAG AAAATGAAAG TGGTCGCAAA 



4001 CTGGGTGAGC AAAAACAGGA AGGCAAAATG CCGCAAAAAA GGGAATAAGG GCGACACGGA AATGTTGAAT ACTCATACTC 
GACCCACTCG TTTTTGTCCT TCCGTTTTAC GGCGTTTTTT CCCTTATTCC CGCTGTGCCT TTACAACTTA TGAGTATGAG 



4081 TTCCTTTTTC AATATTATTG AAGCATTTAT CAGGGTTATT GTCTCATGAG CGGATACATA TTTGAATGTA TTTAGAAAAA 
AAGGAAAAAG TTATAATAAC TTCGTAAATA GTCCCAATAA CAGAGTACTC GCCTATGTAT AAACTTACAT AAATCTTTTT 



4161 TAAACAAATA GGGGTTCCGC GCACATTTCC CCGAAAAGTG CCACCTGACG TCTAAGAAAC CATTATTATC ATGACATTAA 
ATTTGTTTAT CCCCAAGGCG CGTGTAAAGG GGCTTTTCAC GGTGGACTGC AGATTCTTTG GTAATAATAG TACTGTAATT 



4241 CCTATAAAAA TAGGCGTATC ACGAGGCCCT TTCGTC 
GGATATTTTT ATCCGCATAG TGCTCCGGGA AAGCAG 
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TCGCGCGTTT 
AGCGCGCAAA 


CGGTGATGAC 
GCCACTACTG 


GGTGAAAACC 
CCACTTTTGG 


TCTGACACAT 
AGACTGTGTA 


GCAGCTCCCG 
CGTCGAGGGC 


51 


GAGACGGTCA 
CTCTGCCAGT 


CAGCTTGTCT 
GTCGAACAGA 


GTAAGCGGAT 
CATTCGCCTA 


GCCGGGAGCA 
CGGCCCTCGT 


GACAAGCCCG 
CTGTTCGGGC 


101 


TCAGGGCGCG 
AGTCCCGCGC 


TCAGCGGGTG 
AGTCGCCCAC 


TTGGCGGGTG 
AACCGCCCAC 


TCGGGGCTGG 
AGCCCCGACC 


CTTAACTATG 
GAATTGATAC 



151 CGGCATCAGA GCAGATTGTA CTGAGAGTGC ACCATATGAA GCTTTTTGCA 
GCCGTAGTCT CGTCTAACAT GACTCTCACG TGGTATACTT CGAAAAACGT 



Stui 



201 


AAAGCCTAGG 
TTTCGGATCC 


CCTCCAAAAA AGCCTCCTCA CTACTTCTGG AATAGCTCAG 
GG AGGTTTTT TCGGAGGAGT GATGAAGACC TTATCGAGTC 


251 


AGGCCGAGGC 
TCCGGCTCCG 


GGCCTCGGCC TCTGCATAAA TAAAAAAAAT TAGTCAGCCA 
CCGGAGCCGG AGACGTATTT ATTTTTTTTA ATCAGTCGGT 


301 


TGGGGCGGAG 
ACCCCGCCTC 


AATGGGCGGA ACTGGGCGGG GAGGGAATTA TTGGCTATTG 
TTACCCGCCT TGACCCGCCC CTCCCTTAAT AACCGATAAC 


351 


GCCATTGCAT 
CGGTAACGTA 


ACGTTGTATC TATATCATAA TATGTACATT TATATTGGCT 
TGCAACATAG ATATAGTATT ATACATGTAA ATATAACCGA 


401 


CATGTCCAAT 
GTACAGGTTA 


ATGACCGCCA TGTTGACATT GATTATTGAC TAGTTATTAA 
TACTGGCGGT ACAACTGTAA CTAATAACTG ATCAATAATT 


451 


TAGTAATCAA 
ATCATTAGTT 


TTACGGGGTC ATTAGTTCAT AGCCCATATA TGGAGTTCCG 
AATGCCCCAG TAATCAAGTA TCGGGTATAT ACCTCAAGGC 


501 


CGTTACATAA CTTACGGTAA ATGGCCCGCC TGGCTGACCG CCCAACGACC 
GCAATGTATT GAATGCCATT TACCGGGCGG ACCGACTGGC GGGTTGCTGG 


551 


CCCGCCCATT 
GGGCGGGTAA 


GACGTCAATA ATGACGTATG TTCCCATAGT AACGCCAATA 
CTGCAGTTAT TACTGCATAC AAGGGTATCA TTGCGGTTAT 


601 


GGGACTTTCC ATTGACGTCA ATGGGTGGAG TATTTACGGT AAACTGCCCA 
CCCTGAAAGG TAACTGCAGT TACCCACCTC ATAAATGCCA TTTGACGGGT 


651 


CTTGGCAGTA CATCAAGTGT ATCATATGCC AAGTCCGCCC CCTATTGACG 
GAACCGTCAT GTAGTTCACA TAGTATACGG TTCAGGCGGG GGATAACTGC 


701 


TCAATGACGG 
AGTTACTGCC 


TAAATGGCCC GCCTGGCATT ATGCCCAGTA CATGACCTTA 
ATTTACCGGG CGGACCGTAA TACGGGTCAT GTACTGGAAT 


"751 


CGGGACTTTC 
GCCCTGAAAG 


CTACTTGGCA GTACATCTAC GTATTAGTCA TCGCTATTAC 
GATGAACCGT CATGTAGATG CATAATCAGT AGCGATAATG 


801 


CATGGTGATG 
GTACCACTAC 


CGGTTTTGGC AGTACACCAA TGGGCGTGGA TAGCGGTTTG 
GCCAAAACCG TCATGTGGTT ACCCGCACCT ATCGCCAAAC 


851 


ACTCACGGGG 
TGAGTGCCCC 


ATTTCCAAGT CTCCACCCCA TTGACGTCAA TGGGAGTTTG 
TAAAGGTTCA GAGGTGGGGT AACTGCAGTT ACCCTCAAAC 
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901 


TTTTGGCACC AAAATCAACG GGACTTTCCA AAATGTCGTA ATAACCCCGC 
AAAACCGTGG TTTTAGTTGC CCTGAAAGGT TTTACAGCAT TATTGGGGCG 


951 


CCCGTTGACG CAAATGGGCG GTAGGCGTGT ACGGTGGGAG GTCTATATAA 
GGGCAACTGC GTTTACCCGC CATCCGCACA TGCCACCCTC CAGATATATT 


1001 


GCAGAGCTCG TTTAGTGAAC CGTCAGATCG CCTGGAGACG CCATCCACGC 
CGTCTCGAGC AAATCACTTG GCAGTCTAGC GGACCTCTGC GGTAGGTGCG 


1051 


TGTTTTGACC TCCATAGAAG ACACCGGGAC CGATCCAGCC TCCGCGGCCG 
ACAAAACTGG AGGTATCTTC TGTGGCCCTG GCTAGGTCGG AGGCGCCGGC 


1101 


GGAACGGTGC ATTGGAACGC GGATTCCCCG TGCCAAGAGT GACGTAAGTA 
CCTTGCCACG TAACCTTGCG CCTAAGGGGC ACGGTTCTCA CTGCATTCAT 


1151 


CCGCCTATAG ACTCTATAGG CACACCCCTT TGGCTCTTAT GCATGCTATA 
GGCGGATATC TGAGATATCC GTGTGGGGAA ACCGAGAATA CGTACGATAT 


1201 


CTGTTTTTGG CTTGGGGCCT ATACACCCCC GCTCCTTATG CTATAGGTGA 
GACAAAAACC GAACCCCGGA TATGTGGGGG CGAGGAATAC GATATCCACT . 


1251 


TGGTATAGCT TAGCCTATAG GTGTGGGTTA TTGACCATTA TTGACCACTC 
ACCATATCGA ATCGGATATC CACACCCAAT AACTGGTAAT AACTGGTGAG 


1301 


CCCTATTGGT GACGATACTT TCCATTACTA ATCCATAACA TGGCTCTTTG 
GGGATAACCA. CTGCTATGAA AGGTAATGAT TAGGTATTGT ACCGAGAAAC 


1351 


CCACAACTAT CTCTATTGGC TATATGCCAA TACTCTGTCC TTCAGAGACT 
GGTGTTGATA GAGATAACCG ATATACGGTT ATGAGACAGG AAGTCTCTGA 


1401 


GACACGGACT CTGTATTTTT ACAGGATGGG GTCCATTTAT TATTTACAAA 
CTGTGCCTGA GACATAAAAA TGTCCTACCC CAGGTAAATA ATAAATGTTT 


1451 


TTCACATATA CAACAACGCC GTCCCCCGTG CCCGCAGTTT TTATTAAACA 
AAGTGTATAT GTTGTTGCGG CAGGGGGCAC GGGCGTCAAA AATAATTTGT 


1501 


TAGCGTGGGA TCTCCGACAT CTCGGGTACG TGTTCCGGAC ATGGGCTCTT 
ATCGCACCCT AGAGGCTGTA GAGCCCATGC ACAAGGCCTG TACCCGAGAA 


1551 


CTCCGGTAGC GGCGGAGCTT CCACATCCGA GCCCTGGTCC CATCCGTCCA 
GAGGCCATCG CCGCCTCGAA GGTGTAGGCT CGGGACCAGG GTAGGCAGGT 


1601 


GCGGCTCATG GTCGCTCGGC AGCTCCTTGC TCCTAACAGT GGAGGCCAGA 
CGCCGAGTAC CAGCGAGCCG TCGAGGAACG AGGATTGTCA CCTCCGGTCT 


1651 


CTTAGGCACA GCACAATGCC CACCACCACC AGTGTGCCGC ACAAGGCCGT 
GAATCCGTGT CGTGTTACGG GTGGTGGTGG TCACACGGCG TGTTCCGGCA 


1701 


GGCGGTAGGG TATGTGTCTG AAAATGAGCT CGGAGATTGG GCTCGCACCT 
CCGCCATCCC ATACACAGAC TTTTACTCGA GCCTCTAACC CGAGCGTGGA 


1751 


GGACGCAGAT GGAAGACTTA AGGCAGCGGC AGAAGAAGAT GCAGGCAGCT 
CCTGCGTCTA CCTTCTGAAT TCCGTCGCCG TCTTCTTCTA CGTCCGTCGA 


1801 


GAGTTGTTGT ATTCTGATAA GAGTCAGAGG TAACTCCCGT TGCGGTGCTG 
CTCAACAACA TAAGACTATT CTCAGTCTCC ATTGAGGGCA ACGCCACGAC 
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1851 TTAACGGTGG AGGGCAGTGT AGTCTGAGCA GTACTCGTTG CTGCCGCGCG 
AATTGCCACC TCCCGTCACA TCAGACTCGT CATGAGCAAC GACGGCGCGC 



1901 CGCCACCAGA CATAATAGCT GACAGACTAA CAGACTGTTC CTTTCCATGG 
GCGGTGGTCT GTATTATCGA CTGTCTGATT GTCTGACAAG GAAAGGTACC 



♦2 MAP 

EcoRI 

1951 GTCTTTTCTG CAGTCACCGT CGTCGACCTA AGAATTCACC ATGGCGCCCA 
CAGAAAAGAC GTCAGTGGCA GCAGCTGGAT TCTTAAGTGG TACCGCGGGT 



+2 I T A Y A Q Q TRGL LGC IIT 
2 001 TCACGGCGTA CGCCCAGCAG ACAAGGGGCC TCCTAGGGTG CATAATCACC 
AGTGCCGCAT GCGGGTCGTC TGTTCCCCGG AGGATCCCAC GTATTAGTGG 



+2SLTG R D K N Q V EGEV QIV 
2051 AGCCTAACTG GCCGGGACAA AAACCAAGTG GAGGGTGAGG TCCAGATTGT 
TCGGATTGAC CGGCCCTGTT TTTGGTTCAC CTCCCACTCC AGGTCTAACA 



+ 2 STA A Q T F LAT CIN GVC 
2101 GTCAACTGCT GCCCAAACCT TCCTGGCAAC GTGCATCAAT GGGGTGTGCT 
CAGTTGAOGA CGGGTTTGGA AGGACCGTTG CACGTAGTTA CCCCACACGA 



+2 W T V Y HGA GTRT IAS P K G 
2151 GGACTGTCTA-CCACGGGGCC GGAACGAGGA CCATCGCGTC ACCCAAGGGT 
CCTGACAGAT GGTGCCCCGG CCTTGCTCCT GGTAGCGCAG TGGGTTCCCA 



*2 P V I Q MYT NVD Q D L V GWP 
22C1 CCTGTCATCC AGATGTATAC CAATGTAGAC CAAGACCTTG TGGGCTGGCC 
GGACAGTAGG TCTACATATG GTTACATCTG GTTCTGGAAC ACCCGACCGG 



+2 A S Q GTRS LTP CTC GSS 
2251 CGCTTCGCAA GGTACCCGCT CATTGACACC CTGCACTTGC GGCTCCTCGG 
GCGAAGCGTT CCATGGGCGA GTAACTGTGG GACGTGAACG CCGAGGAGCC 



+ 2DLYL VTR H A D V IPV RRR 
2301 ACCTTTACCT GGTCACGAGG CACGCCGATG TCATTCCCGT GCGCCGGCGG 
TGGAAATGGA CCAGTGCTCC GTGCGGCTAC AGTAAGGGCA CGCGGCCGCC 



*2GDSR GSL LSP RPIS YLK 
2351 GGTGATAGCA GGGGCAGCCT GCTGTCGCCC CGGCCCATTT CCTACTTGAA 
CCACTATCGT CCCCGTCGGA CGACAGCGGG GCCGGGTAAA GGATGAACTT 



+2 GSS GGPL LCP AGH A V G 
24 01 AGGCTCCTCG GGGGGTCCGC TGTTGTGCCC CGCGGGGCAC GCCGTGGGCA 
TCCGAGGAGC CCCCCAGGCG ACAACACGGG GCGCCCCGTG CGGCACCCGT 



+2IFRA AVC TRGV AKA VDF 
2 4 51 TATTTAGGGC CGCGGTGTGC ACCCGTGGAG TGGCTAAGGC GGTGGACTTT 
ATAAATCCCG GCGCCACACG TGGGCACCTC ACCGATTCCG CCACCTGAAA 



+2IPVE NLE TTM RSPV FTD 
2501 ATCCCTGTGG AGAACCTAGA GACAACCATG AGGTCCCCGG TGTTCACGGA 
TAGGGACACC TCTTGGATCT CTGTTGGTAC TCCAGGGGCC ACAAGTGCC? 
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+ 2 N S S' P P V V PQS TQV AHL 
2551 TAACTCCTCT CCACCAGTAG TGCCCCAGAG CTTCCAGGTG GCTCACCTCC 
ATTGAGGAGA GGTGGTCATC ACGGGGTCTC GAAGGTCCAC CGAGTGGAGG 



+2HAPT GSG KSTK VPA AYA 
2601 ATGCTCCCAC AGGCAGCGGC AAAAGCACCA AGGTCCCGGC TGCATATGCA 
TACGAGGGTG TCCGTCGCCG TTTTCGTGGT TCCAGGGCCG ACGTATACGT 



+ 2AQG.Y KVL VLN PSVA ATL 
2651 GCTCAGGGCT ATAAGGTGCT AGTACTCAAC CCCTCTGTTG CTGCAACACT 
CGAGTCCCGA TATTCCACGA TCATGAGTTG GGGAGACAAC GACGTTGTGA 



+ 2 GFG AYMS KAH GID PNI 
2701 GGGCTTTGGT GCTTACATGT CCAAGGCTCA TGGGATCGAT CCTAACATCA 
CCCGAAACCA CGAATGTACA GGTTCCGAGT ACCCTAGCTA GGATTGTAGT 



+ 2 R T. G V RTI TTGS PIT YST 
2751 GGACCGGGGT GAGAACAATT ACCACTGGCA GCCCCATCAC GTACTCCACC 
CCTGGCCCCA CTCTTGTTAA TGGTGACCGT CGGGGTAGTG CATGAGGTGG 



+2YGKF LAD GGC SGGA YDI 
2801 TACGGCAAGT TCCTTGCCGA CGGCGGGTGC TCGGGGGGCG CTTATGACAT 
ATGCCGTTCA AGGAACGGCT GCCGCCCACG AGCCCCCCGC GAATACTGTA 



+2 IIC DECH STD ATS ILG 
2851 AATAATTTGT GACGAGTGCC ACTCCACGGA TGCCACATCC ATCTTGGGCA 
TTATTAAACA CTGCTCACGG TGAGGTGCCT ACGGTGTAGG TAGAACCCGT 



+ 2.IGTV LDQ AETA GAR LVV 
2901 TTGGCACTGT CCTTGACCAA GCAGAGACTG CGGGGGCGAG ACTGGTTGTG 
AACCGTGACA GGAACTGGTT CGTCTCTGAC GCCCCCGCTC TGACCAACAC 



+ 2 LATA TPP GSV TVPH PNI 
2951 CTCGCCACCG CCACCCCTCC GGGCTCCGTC ACTGTGCCCC ATCCCAACAT 
GAGCGGTGGC GGTGGGGAGG CCCGAGGCAG TGACACGGGG TAGGGTTGTA 



+ 2 E E V ALST TGE I P F YGK 
30O1 CGAGGAGGTT GCTCTGTCCA CCACCGGAGA GATCCCTTTT TACGGCAAGG 
GCTCCTCCAA CGAGACAGGT GGTGGCCTCT CTAGGGAAAA ATGCCGTTCC 



+ 2AIPL EVI KGGR HLI FCH 
3051 CTATCCCCCT CGAAGTAATC AAGGGGGGGA GACATCTCAT CTTCTGTCAT 
GATAGGGGGA GCTTCATTAG TTCCCCCCCT CTGTAGAGTA GAAGACAGTA 



+2SKKK CDE L A A KLVA L G I 
3101 TCAAAGAAGA AGTGCGACGA ACTCGCCGCA AAGCTGGTCG CATTGGGCAT 
AGTTTCTTCT TCACGCTGCT TGAGCGGCGT TTCGACCAGC GTAACCCGTA 



+2 N A V AYYR GLD VSV IPT 
3151 CAATGCCGTG GCCTACTACC GCGGTCTTGA CGTGTCCGTC ATCCCGACCA 
GTTACGGCAC CGGATGATGG CGCCAGAACT GCACAGGCAG TAGGGCTGGT 



+2SGDV V V V ATDA LMT GYT 
3201 GCGGCGATGT TGTCGTCGTG GCAACCGATG CCCTCATGAC CGGCTATACC 
CGCCGCTACA ACAGCAGCAC CGTTGGCTAC GGGAGTACTG GCCGATATGG 



28/100 



WO 01/38360 


PCT/US00/32326 




PCMV-NS34A 






FIGURE 9 - Page 5 




+2 
3251 


GDFD SVI DCN TCVT QTV 
GGCGACTTCG ACTCGGTGAT AGACTGCAAT ACGTGTGTCA CCCAGACAGT 
CCGCTGAAGC TGAGCCACTA TCTGACGTTA TGCACACAGT GGGTCTGTCA 




+2 
3301 


DFS LDPT F T I ETI T L P 
CGATTTCAGC CTTGACCCTA CCTTCACCAT TGAGACAATC ACGCTCCCCC 
GCTAAAGTCG GAACTGGGAT GGAAGTGGTA ACTCTGTTAG TGCGAGGGGG 




+2 
3351 


QDAV SRT QRRG RTG RGK 
AAGATGCTGT CTCCCGCACT CAACGTCGGG GCAGGACTGG CAGGGGGAAG 
TTCTACGACA GAGGGCGTGA GTTGCAGCCC CGTCCTGACC GTCCCCCTTC 




+ 2 
3401 


P G I Y RFV APG ERPS GMF 
CCAGGCATCT ACAGATTTGT GGCACCGGGG GAGCGCCCCT CCGGCATGTT 
GGTCCGTAGA TGTCTAAACA CCGTGGCCCC CTCGCGGGGA GGCCGTACAA 




+ 2 
3451 


DSS VLCE CYD AGC A W Y 
CGACTCGTCC GTCCTCTGTG AGTGCTATGA CGCAGGCTGT GCTTGGTATG 
GCTGAGCAGG CAGGAGACAC TCACGATACT GCGTCCGACA CGAACCATAC 




+ 2 
3501 


E L T P AET TVRL RAY MNT 
AGCTCACGCC CGCCGAGACT ACAGTTAGGC TACGAGCGTA CATGAACACC 
TCGAGTGCGG GCGGCTCTGA TGTCAATCCG ATGCTCGCAT GTACTTGTGG 




+ 2 
3551 


PGLP VCQ DHL EFWE GVF 
CCGGGGCTTC CCGTGTGCCA GGACCATCTT GAATTTTGGG AGGGCGTCTT 
GGCCCCGAAG GGCACACGGT CCTGGTAGAA CTTAAAACCC TCCCGCAGAA 




+2 


TGL THID AHF L S 0 TKQ 
StuI 




3601 


TACAGGCCTC ACTCATATAG ATGCCCACTT TCTATCCCAG ACAAAGCAGA 
ArGTCCGGAG TGAGTATATC TACGGGTGAA AGATAGGGTC TGTTTCGTCT 




+2 
3651 


SGEN LPY L V A Y QAT VCA 
GTGGGGAGAA CCTTCCTTAC CTGGTAGCGT ACCAAGCCAC CGTGTGCGCT 
CACCCCTCTT GGAAGGAATG GACCATCGCA TGGTTCGGTG GCACACGCGA 




+ 2 
3701 


R A Q A PPP SWD QMWK CLI 
AGGGCTCAAG CCCCTCCCCC ATCGTGGGAC CAGATGTGGA AGTGTTTGAT 
TCCCGAGTTC GGGGAGGGGG TAGCACCCTG GTCTACACCT TCACAAACTA 




+ 2 
3751 


R L K PTLH GPT PLL YRL 
TCGCCTCAAG CCCACCCTCC ATGGGCCAAC ACCCCTGCTA TACAGACTGG 
AGCGG AG TTC GGGTGGGAGG TACCCGGTTG TGGGGACGAT ATGTCTGACC 




+ 2 
3301 


G A V Q NEI TLTH PVT KYI 
GCGCTGTTCA GAATGAAATC ACCCTGACGC AOCCAGTCAC CAAATACATC 
CGCGACAAGT CTTACTTTAG TGGGACTGCG TGGGTCAGTG GTTTATGTAG 




+2 
3851 


MTCM SAD L EV VTST WVL 
ATGACATGCA TGTCGGCCGA CCTGG AGGTC GTCACGAGCA CCTGGGTGCT 
TACTGTACGT ACAGCCGGCT GGACCTCCAG CAGTGCTCGT GGACCCACGA 




42 
3901 


VGG VLAA L A A YCL STG 
CGTTGGCGGC GTCCTGGCTG CTTTGGCCGC GTATTGCCTG TCAACAGGCT 
GCAACCGCCG CAGGACCGAC GAAACCGGCG CATAACGGAC AGTTG7CCGA 
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+2 C V V I V G R VVLS GKP All 
3951 GCGTGGTCAT AGTGGGCAGG GTCGTCTTGT CCGGGAAGCC GGCAATCATA 
CGCACCAGTA TCACCCGTCC CAGCAGAACA GGCCCTTCGG CCGTTAGTAT 



+ 2PDRE VLY REF DEME EC 
4 001 CCTGACAGGG AAGTCCTCTA CCGAGAGTTC GATGAGATGG AAGAGTGCTA 
GGACTGTCCC TTCAGGAGAT GGCTCTCAAG CTACTCTACC TTCTCACGAT 



BamHI Mlul 



4051 GGATCCACTA CGCGTTAGAG CTCGCIGATC AGCCTCGACT GTGCCTTCTA 
CCTAGGTGAT GCGCAATCTC GAGCGACTAG TCGGAGCTGA CACGGAAGAT 



4101 GTTGCCAGCC ATCTGTTGTT TGCCCCTCCC CCGTGCCTTC CTTGACCCTG 
CAACGGTCGG TAGACAACAA ACGGGGAGGG GGCACGGAAG GAACTGGGAC 



4151 GAAGGTGCCA CTCCCACTGT CCTTTCCTAA TAAAATGAGG AAATTGCATC 
CTTCCACGGT GAGGGTGACA GGAAAGGATT ATTTTACTCC TTTAACGTAG 



4201 GCATTGTCTG AGTAGGTGTC ATTCTATTCT GGGGGGTGGG GTGGGGCAGG 
CGTAACAGAC TCATCCACAG TAAGATAAGA CCCCCCACCC CACCCCGTCC 



4251 ACAGCAAGGG GGAGGATTGG GAAGACAATA GCAGGCATGC TGGGGAGCTC 
TGTCGTTCCC CCTCCTAACC CTTCTGTTAT CGTCCGTACG ACCCCTCGAG 



4301 TTCCGCTTCC TCGCTCACTG ACTCGCTGCG CTCGGTCGTT CGGCTGCGGC 
AAGGCGAAGG AGCGAGTGAC TGAGCGACGC GAGCCAGCAA GCCGACGCCG 



4351 GAGCGGTATC AGCTCACTCA AAGGCGGTAA TACGGTTATC CACAGAATCA 
CTCGCCATAG TCGAGTGAGT TTCCGCCATT ATGCCAATAG GTGTCTTAGT 



4401 GGGGATAACG CAGGAAAGAA CATGTGAGCA AAAGGCCAGC AAAAGGCCAG 
CCCCTATTGC GTCCTTTCTT GTACACTCGT TTTCCGGTCG TTTTCCGGTC 



4451 GAACCGTAAA AAGGCCGCGT TGCTGGCGTT TTTCCATAGG CTCCGCCCCC 
CTTGGCATTT TTCCGGCGCA ACGACCGCAA AAAGGTATCC GAGGCGGGGG 



4501 CTGACGAGCA TCACAAAAAT CGACGCTCAA GTCAGAGGTG GCGAAACCCG 
GACTGCTCGT AGTGTTTTTA GCTGCGAGTT CAGTCTCCAC CGCTTTGGGC 



4551 ACAGGACTAT AAAGATACCA GGCGTTTCCC CCTGGAAGCT CCCTCGTGCG 
TGTCCTGATA TTTCTATGGT CCGCAAAGGG GGACCTTCGA GGGAGCACGC 



4601 CTCTCCTGTT CCGACCCTGC CGCTTACCGG ATACCTGTCC GCCTTTCTCC 
GAGAGGACAA GGCTGGGACG GCGAATGGCC TATGGACAGG CGGAAAGAGG 



4 651 CTTCGGGAAG CGTGGCGCTT TCTCAATGCT CACGCTGTAG GTATCTCAGT 
GAAGCCCTTC GCACCGCGAA AGAGTTACGA GTGCGACATC CATAGAGTCA 



4701 TCGGTGTAGG TCGTTCGCTC CAAGCTGGGC TGf GTGCACG AACCCCCCGT 
AGCCACATCC AGCAAGCGAG GTTCGACCCG ACACACGTGC TTGGGGGGCA 



4751 TCAGCCCGAC CGCTGCGCCT TATCCGGTAA CTATCGTCTT GAGTCCAACC 
AGTCGGGCTG GCGACGCGGA AT AGGCCATT GATAGCAGAA CTCAGGTTGG 



4801 CGGTAAGACA CGACTTATCG CCACTGGCAG CAGCCACTGG TAACAGGATT 
GCCATTCTGT GCTGAATAGC GGTGACCGTC GTCGGTGACC ATTGTCCTAA 
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4851 


AGCAGAGCGA GGTATGTAGG CGGTGCTACA GAGTTCTTGA AGTGGTGGCC 
TCGTCTCGCT CCATACATCC GCCACGATGT CTCAAGAACT TCACCACCGG 


4901 


TAACTACGGC TACACTAGAA GGACAGTATT TGGTATCTGC GCTCTGCTGA 
ATTGATGCCG ATGTGATCTT CCTGTCATAA ACCATAGACG CGAGACGACT 


4951 


AGCCAGTTAC 
TCGGTCAATG 


CTTCGGAAAA 
GAAGCCTTTT 


AGAGTTGGTA 
TCTCAACCAT 


CGAGAACTAG GCCGTTTGTT 


5001 


ACCACCGCTG 
TGGTGGCGAC 


GTAGCGGTGG 
CATCGCCACC 


TTTTTTTGTT 
AAAAAAACAA 


TGCAAGCAGC AGATTACGCG 
ACGTTCGTCG TCTAATGCGC 


5051 


CAGAAAAAAA 
GTCTTTTTTT 


GGATCTCAAG 
CCTAGAGTTC 


AAGATCCTTT 
TTCTAGGAAA 


GATCTTTTCT ACGGGGTCTG 
CTAGAAAAGA TGCCCCAGAC 


5101 


ACGCTCAGTG 
TGCGAGTCAC 


GAACGAAAAC 
CTTGCTTTTG 


TCACGTTAAG 
AGTGCAATTC 


GGATTTTGGT CATGAGATTA 
CCTAAAACCA GTACTCTAAT 


5151 


TCAAAAAGGA 
AGTTTTTCCT 


TCTTCACCTA 
AGAAGTGGAT 


GATCCTTTTA 
CTAGGAAAAT 


AATTAAAAAT GAAGTTTTAA 
TTAATTTTTA CTTCAAAATT 


5201 


ATCAATCTAA 
TAGTTAGATT 


AGTATATATG 
TCATATATAC 


AGTAAACTTG 
TCATTTGAAC 


GTCTGACAGT TACCAATGCT 
CAGACTGTCA ATGGTTACGA 


5251 


TAATCAGTGA 
ATTAGTCACT 


GGCACCTATC 
CCGTGGATAG 


TCAGCGATCT 
AGTCGCTAGA 


GTCTATTTCG TTCATCCATA 
CAGATAAAGC AAGTAGGTAT 


5301 


GTTGCCTGAC 
CAACGGACTG 


TCCCCGTCGT 
AGGGGCAGCA 


GTAGATAACT 
CATCTATTGA 


ACGATACGGG AGGGCTTACC 
TGCTATGCCC TCCCGAATGG 


5351 


ATCTGGCCCC 
TAGACCGGGG 


AGTGCTGCAA 
TCACGACGTT 


TGATACCGCG 
ACTATGGCGC 


AGACCCACGC TCACCGGCTC 
TCTGGGTGCG AGTGGCCGAG 


5401 


CAGATTTATC 
GTCTAAATAG 


AGCAATAAAC 
TCGTTATTTG 


CAGCCAGCCG 
GTCGGTCGGC 


GAAGGGCCGA GCGCAGAAGT 
CTTCCCGGCT CGCGTCTTCA 


5451 


GGTCCTGCAA 
CCAGGACGTT 


CTTTATCCGC 
GAAATAGGCG 


CTCCATCCAG 
GAGGTAGGTC 


TCTATTAATT GTTGCCGGGA 
AGATAATTAA CAACGGCCCT 


5501 


AGCTAGAGTA 
TCGATCTCAT 


AGTAGTTCGC CAGTTAATAG TTTGCGCAAC GTTGTTGCCA 
TCATCAAGCG GTCAATTATC AAACGCGTTG CAACAACGGT 


5551 


TTGCTACAGG CATCGTGGTG TCACGCTCGT CGTTTGGTAT GGCTTCATTC 
AACGATGTCC GTAGCACCAC AGTGCGAGCA GCAAACCATA CCGAAGTAAG 


5601 


AGCTCCGGTT 
TCGAGGCCAA 


CCCAACGATC 
GGGTTGCTAG 


AAGGCGAGTT 
TTCCGCTCAA 


ACATGATCCC CCATGTTGTG 
TGTACTAGGG GGTACAACAC 


5651 


CAAAAAAGCG GTTAGCTCCT TCGGTCCTCC GATCGTTGTC AGAAGTAAGT 
GTTTTTTCGC CAATCGAGGA AGCCAGGAGG CTAGCAACAG TCTTCATTCA 


5701 


TGGCCGCAGT 
ACCGGCGTCA 


GTTATCACTC 
CAATAGTGAG 


ATGGTTATGG 
TACCAATACC 


CAGCACTGCA TAATTCTCTT 
GTCGTGACGT ATTAAGAGAA 


5751 


ACTGTCATGC 
TGACAGTACG 


CATCCGTAAG 
GTAGGCATTC 


ATGCTTTTCT 
TACGAAAAGA 


GTGACTGGTG AGTACTCAAC 
CACTGACCAC TCATGAGTTG 
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5801 


CAAGTCATTC 
GTTCAGTAAG 


TGAGAATAGT 
ACTCTTATCA 


GTATGCGGCG ACCGAGTTGC 
CATACGCCGC TGGCTCAACG 


TCTTGCCCGG 
AGAACGGGCC 


5851 


CGTCAATACG 
GCAGTTATGC 


GGATAATACC 
CCTATTATGG 


GCGCCACATA GCAGAACTTT 
CGCGGTGTAT CGTCTTGAAA 


AAAAGTGCTC 
TTTTCACGAG 


5901 


ATCATTGGAA 
TAGTAACCTT 


AACGTTCTTC 
TTGCAAGAAG 


GGGGCGAAAA CTCTCAAGGA TCTTACCGCT 
CCCCGCTTTT GAGAGTTCCT AGAATGGCGA 


5951 


GTTGAGATCC AGTTCGATGT AACCCACTCG TGCACCCAAC TGATCTTCAG 
CAACTCTAGG TCAAGCTACA TTGGGTGAGC ACGTGGGTTG ACTAGAAGTC 


6001 


CATCTTTTAC TTTCACCAGC 
GTAGAAAATG AAAGTGGTCG 


GTTTCTGGGT GAGCAAAAAC 
CAAAGACCCA CTCGTTTTTG 


AGGAAGGCAA 
TCCTTCCGTT 


6051 


AATGCCGCAA AAAAGGGAAT 
TTACGGCGTT TTTTCCCTTA 


AAGGGCGACA CGGAAATGTT 
TTCCCGCTGT GCCTTTACAA 


GAATACTCAT 
CTTATGAGTA 


6101 


ACTCTTCCTT 
TGAGAAGGAA 


TTTCAATATT 
AAAGTTATAA 


ATTGAAGCAT TTATCAGGGT TATTGTCTCA 
TAACTTCGTA AATAGTCCCA ATAACAGAGT 


6151 


TGAGCGGATA 
ACTCGCCTAT 


CATATTTGAA 
GTATAAACTT 


TGTATTTAGA AAAATAAACA 
ACATAAATCT TTTTATTTGT 


AATAGGGGTT 
TTATCCCCAA 


6201 


CCGCGCACAT 
GGCGCGTGTA 


TTCCCCGAAA 
AAGGGGCTTT 


AGTGCCACCT GACGTCTAAG 
TCACGGTGGA CTGCAGATTC 


AAACCATTAT 
TTTGGTAATA 


6251 


TATCATGACA TTAACCTATA 
ATAGTACTGT AATTGGATAT 


AAAATAGGCG TATCACGAGG 
TTTTATCCGC ATAGTGCTCC 


CCCTTTCGTC 
GGGAAAGCAG 
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MetAlaAlaTyrAlaAlaGlnGlyTyrLysValLeuVal 
2 AGCTTACAAAACAAATTCACCATGGCTGCATATGCAGCTCAGGGCTATAAGGTGCTAGTA 
tCGAATGTTTTGTTTAAGTGGTACCGACGTATACGTCGAGTCCCGATATTCCACGATCAT 

1 HIND3, 21 NCOI, 30 NDEI , 58 SCAI, 

LeuAsnProSerValAiaAlaThrLeuGlyPheGlyAlaTyrMetSerLysAlaHisGly 
62 CTCAACCCCTCTGTTGCTGCAACACTGGGCTTTGGTGCTTACATGTCCAAGGCTCATGGG 
GAGTTGGGGAGACAACGACGTTGTGACCCGAAACCACGAATGTACAGGTTCCGAGTACCC 

IleAspProAsnlleArgThrGlyValArgThrlleThrThrGlySerProIleThrTyr 
122 ATCGATCCTAACATCAGGACCGGGGTGAGAACAATT ACCACTGGC AGCCCCATCACGTAC 
TAGCTAGGATTGTAGTCCTGGCCCCACTCTTGTTAATGGTGACCGTCGGGGTAGTGCATG 

122 CLAI, 

SerThrTyrGlyLysPheLeuAlaAspGlyGlyCysSerGlyGlyAlaTyrAspIlelle 
182 TCCACCTACGGCAAGTTCCTTGCCGACGGCGGGTGCTCGGGGGGCGCTTATGACATAATA 
AGGTGGATGCCGTTCAAGGAACGGCTGCCGCCCACGAGCCCCCCGCGAATACTGTATTAT 

IleCysAspGluCysHisSerThrAspAlaThrSerlleLeuGlylleGlyThrValLeu 
242 ATTTGTGACGAGTGCCACTCCACGGATGCCACATCCATCTTGGGCATTGGCACTGTCCTT 
TAAACACTGCTCACGGTGAGGTGCCTACGGTGTAGGTAGAACCCGTAACCGTGACAGGAA 

AspGlnAlaGluThrAlaGlyAlaArgLeuValValLeuAlaThrAlaThrProProGly 
302 GACCAAGCAGAGACTGCGGGGGCGAGACTGGTTGTGCTCGCCACCGCCACCCCTCCGGGC 
CTGGTTCGTCTCTGACGCCCCCGCTCTGACCAACACGAGCGGTGGCGGTGGGGAGGCCCG 

A 

309 ALWN1, 

SerValThrValProHisProAsnlleGluGluValAlaLeuSerThrThrGlyGluIle 
362 TCCGTCACTGTGCCCCATCCCAACATCGAGGAGGTTGCTCTGTCCACCACCGGAGAGATC 
AGGCAGTGACACGGGGTAGGGTTGTAGCTCCTCCAACGAGACAGGTGGTGGCCTCTCTAG 

ProPheTyrGlyLysAlalleProLeuGluVallleLysGlyGlyArgHisLeuTlePhe 
4 22 CCTTTTTACGGCAAGGCTATCCCCCTCGAAGTAATCAAGGGGGGGAGACATCTCATCTTC 
GGAAAAATGCCGTTCCGATAGGGGGAGCTTCATTAGTTCCCCCCCTCTGTAGAGTAGAAG 

CysHisSerLysLysLysCysAspGluLeuAlaAlaLysLeuValAlaLeuGlylleAsn 
4 82 TGTCATTCAAAGAAGAAGTGCGACGAACTCGGCGCAAAGCTGGTCGCATTGGGCATCAAT 
ACAGTAAGTTTCTTCTTCACGCTGCTTGAGCGGCGTTTCGACCAGCGTAACCCGTAGTTA 

AlaValAlaTyrTyrArgGlyLeuAspValSerVallleProThrSerGlyAspValVal 
54 2 GCCGTGGCCTACTACCGCGGTCTTGACGTGTCCGTCATCCCGACCAGCGGCGATGTTGTC 
CGGCACCGGATGATGGCGCCAGAACTGCACAGGCAGTAGGGCTGGTCGCCGCTACAACAG 

A A 

556 SAC2 , 566 DRD1, 

. ValValAlaThrAspAlaLeuMetThrGlyTyrThrGlyAspPheAspSerVall leAsp 
602 GTCGTGGCAACCGATGCCCTCATGACCGGCTATACCGGCGACTTCGACTCGGTGATAGAC 
CAGCACCGTTGGCTACGGGAGTACTGGCCGATATGGCCGCTGAAGCTGAGCCACTATCTG 

A 

621 BSPH1, 

CysAsnThrCysValThrGlnThrValAspPheSerLeuAspProThrPheThrlleGlu 
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662 TGCAATACGTGTGTCACCCAGACAGTCGATTTCAGCCTTGACCCTACCTTCACCATTGAG 
ACGTTATGCACACAGTGGGTCTGTCAGCTAAAGTCGGAACTGGGATGGAAGTGGTAACTC 

'ThrlleThrLeuProGlnAspAlaValSerArgThrGlnArgArgGlyArgThrGlyArg 
722 ACAATCACGCTCCCCCAAGATGCTGTCTCCCGCACTCAACGTCGGGGCAGGACTGGCAGG 
TGTTAGTGCGAGGGGGTTCTACGACAGAGGGCGTGAGTTGCAGCCCCGTCCTGACCGTCC 

GlyLysProGlylleTyrArgPheValAlaProGlyGluArgProSerGlyMetPheAsp 
782 GGGAAGCCAGGCATCT ACAGATTTGTGGCACCGGGGGAGCGCCCCTCCGGCATGTTCGAC 
CCCTTCGGTCCGTAGATGTCTAAACACCGTGGCCCCCTCGCGGGGAGGCCGTACAAGCTG 

A A 

822 BGLI, 839 DRD1 , 

SerSerValLeuCysGluCysTyrAspAlaGlyCysAlaTrpTyrGluLeuThrProAla 
842 TCGTCCGTCCTCTGTGAGTGCTATGACGCAGGCTGTGCTTGGTATGAGCTCACGCCCGCC 
AGCAGGCAGGAGACACTCACGATACTGCGTCCGACACGAACCATACTCGAGTGCGGGCGG 

887 SACI, 

GluThrThrValArgLeuArgAlaTyrMetAsnThrProGlyLeuProValCysGlnAsp 
902 GAGACTACAGTTAGGCTACGAGCGTACATGAACACCCCGGGGCTTCCCGTGTGCCAGGAC 
CTCTGATGTCAATCCGATGCTCGCATGTACTTGTGGGGCCCCGAAGGGCACACGGTCCTG 

A 

937 SMAI XMAI, 

HisLeuGluPheTrpGluGlyValPheThrGlyLeuThrHisIleAspAlaHisPheLeu 
962 CATCTTGAATTTTGGGAGGGCGTCTTTACAGGCCTCACTCATATAGATGCCCACTTTCTA 
GTAGAACTTAAAACCCTCCCGCAGAAATGTCCGGAGTGAGTATATCTACGGGTGAAAGAT 

a 

991 STUI, 

SerGlnThrLysGlnSerGlyGluAsnLeuProTyrLeuValAlaTyrGlnAlaThrVal 
1022 TCCCAGACAAAGCAGAGTGGGGAGAACCTTCCTTACCTGGTAGCGTACCAAGCCACCGTG 

AGGGTCTGTTTCGTCTCACCCCTCTTGGAAGGAATGGACCATCGCATGGTTCGGTGGCAC 

a 

1075 DRA3, 

CysAlaArgAlaGlnAl'aProProProSerTrpAspGlnMetTrpLysCysLeuIleArg 
1082 TGCGCTAGGGCTCAAGCCCCTCCCCCATCGTGGGACCAGATGTGGAAGTGTTTGATTCGC 
ACGCGATCCCGAGTTCGGGGAGGGGGTAGCACCCTGGTCTACACCTTCACAAACTAAGCG 

LeuLysProThrLeuHisGlyProThrProLeuLeuTyrArgLeuGlyAlaValGlnAsn 
114 2 CTCAAGCCCACCCTCCATGGGCCAACACCCCTGCTATACAGACTGGGCGCTGTTCAGAAT 
GAGTTCGGGTGGGAGGTACCCGGTTGTGGGGACGATATGTCTGACCCGCGACAAGTCTTA 

A 

1156 NCOI, 

GluIleThrLeuThrHisProValThrLysTyrlleMetThrCysMetSerAlaAspLeu 
1202 GAAATCACCCTGACGCACCCAGTCACCAAATACATCATGACATGGATGTCGGCCGACCTG 
CTTTAGTGGGACTGCGTGGGTCAGTGGTTTATGTAGTACTGTACGTACAGCCGGCTGGAC 

AAA A A 

1236 BSPH1, 1240 DRD1, 1243 AVA3, 1251 EAG1 XMA3, 1256 DRD1, 



GluValValThrSerThrTrpValLeuValGlyGlyValLeuAlaAlaLeuAlaAlaTyr 
1262 GAGGTCGTCACGAGCACCTGGGTGCTCGTTGGCGGCGTCCTGGCTGCTTTGGCCGCGTAT 
rTrcRaCAGTGCTCGTGGACCCACGAGCAACCGCCGCAGGACCGACGAAACCGGCGCATA 
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gac tat aac ccc ccg eta gtg gag acg tgg aaa aag ccc gac tac gaa 15879 
Asp Tyr Asn Pro Pro Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu 
1055 1060 1065 

cca cct gtg gtc cat ggc tgc ccg ctt cca cct cca aag tec cct cct 15927 
Pro Pro Val Val His Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro 
1070 1075 1080 

gtg cct ccg cct egg aag aag egg acg gtg gtc etc act gaa tea acc 15975 
Val Pro Pro Pro Arg Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr 
1085 1090 1095 

eta tct act gee ttg gec gag etc gee acc aga age ttt ggc age tec 1602 3 
Leu Ser Thr Ala Leu Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser 
1100 1105 1110 1115 

tea act tec ggc att acg ggc gac aat acg aca aca tec tct gag ccc 16071 
Ser Thr Ser Gly lie Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro 
1120 1125 1130 

gee cct tct ggc tgc ccc ccc gac tec gac get gag tec tat tec tec 16119 
Ala Pro Ser Gly Cys Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser 
1135 1140 1145 

atg ccc ccc ctg gag ggg gag cct ggg gat ccg gat ctt age gac ggg 16167 
Met Pro Pro Leu Glu Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly 
1150 1155 1160 

tea tgg tea acg gtc agt agt gag gee aac gcg gag gat gtc gtg tgc 16215 
Ser Trp Ser Thr Val Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys 
1165 1170 1175 

tgc tea atg tct tac tct tgg aca ggc gca etc gtc acc ccg tgc gee 16263 
Cys Ser Met Ser Tyr Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala 
1180 1185 1190 1195 

gcg gaa gaa cag aaa ctg ccc ate aat gca eta age aac teg ttg eta 16311 
Ala Glu Glu Gin Lys Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu 
1200 1205 1210 

cgt cac cac aat ttg gtg tat tec acc acc tea cgc agt get tgc caa 16359 
Arg His His Asn Leu Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin 
1215 1220 1225 

agg cag aag aaa gtc aca ttt gac aga ctg caa gtt ctg gac age cat 16407 
Arg Gin Lys Lys Val Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His 
1230 1235 1240 

tac cag gac gta etc aag gag gtt aaa gca gcg gcg tea aaa gtg aag 16455 
Tyr Gin Asp Val Leu Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys 
1245 1250 1255 

get aac ttg eta tec gta gag gaa get tgc age ctg acg ccc cca cac 16503 
Ala Asn Leu Leu Ser Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His 
1260 1265 1270 1275 

tea gee aaa tec aag ttt ggt tat ggg gca aaa gac gtc cgt tgc cat 16551 
Ser Ala Lys Ser Lys Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His 
1280 1285 1290 
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CysL uSerThrGlyCysValValll ValGlyArgValValLeuSerGlyLysProAla 
1322 TGCCTGTCAACAGGCTGCGTGGTCATAGTGGGCAGGGTCGTCTTGTCCGGGAAGCCGGCA 
ACGGACAGTTGTCCGACGCACCAGTATCACCCGTCCCAGCAGAACAGGCCCTTCGGCCGT 

1375 NAEI, 

IlerieProAspArgGluValLeuTyrArgGluPheAspGluMetGluGluCysSerGln 
1382 ATCATACCTGACAGGGAAGTCCTCTACCGAGAGTTCGATGAGATGGAAGAGTGCTCTCAG 
TAGTATGGACTGTCCCTTCAGGAGATGGCTCTCAAGCTACTCTACCTTCTCACGAGAGTC 

1391 DRD1, 

HisLeuProTyrlleGluGlnGlyMetMetLeuAlaGluGlnPheLysGlnLysAlaLeu 
14 42 CACTTACCGTACATCGAGCAAGGGATGATGCTCGCCGAGCAGTTCAAGCAGAAGGCCCTC 
GTGAATGGCATGTAGCTCGTTCCCTACTACGAGCGGCTCGTCAAGTTCGTCTTCCGGGAG 

GlyLeuLeuGlnThrAlaSerArgGlnAlaGluVallleAlaProAlaValGlnThrAsn 
1502 GGCCTCCTGCAGACCGCGTCCCGTCAGGCAGAGGTTATCGCCCCTGCTGTCCAGACCAAC 
CCGGAGGACGTCTGGCGCAGGGCAGTCCGTCTCGAATAGCGGGGACGACAGGTCTGGTTG 

1508 PSTI, 1513 TTH3I, 

TrpGlnLysLeuGluThrPheTrpAlaLysHisMetTrpAsnPhelleSerGlylleGln 
1562 TGGCAAAAACTCGAGACCTTCTGGGCGAAGCATATGTGGAACTTCATCAGTGGGATACAA 
ACCGTTTTTGAGCTCTGGAAGACCCGCTTCGTATACACCTTGAAGTAGTCACCCTATGTT 

A A 

1571 XHOI, 1592 NDEI, 

TyrLeuAlaGlyLeuSerThrLeuProGlyAsnProAlalleAlaSerLeuMetAlaPhe 
1622 TACTTGGCGGGCTTGTCAACGCTGCCTGGTAACCCCGCCATTGCTTCATTGATGGCTTTT 
ATGAACCGCCCGAACAGTTGCGACGGACCATTGGGGCGGTAACGAAGTAACTACCGAAAA 

A 

1649 BSTE2, 

ThrAlaAlaValThrSerProLeuThrThrSerGlnThrLeuLeuPheAsnlleLeuGly 
1682 ACAGCTGCTGTCACCAGCCCACTAACCACTAGCCAAACCCTCCTCTTCAACATATTGGGG 
TGTCGACGACAGTGGTCGGGTGATTGGTGATCGGTTTGGGAGGAGAAGTTGTATAACCCC 

1683 ALWN1 PVU2, 

GlyTrpValAlaAlaGlnLeuAlaAlaProGlyAlaAlaThrAlaPheValGlyAlaGly 
174 2 GGGTGGGTGGCTGCCCAGCTCGCCGCCCCCGGTGCCGCTACTGCCTTTGTGGGCGCTGGC 
CCCACCCACCGACGGGTCGAGCGGCGGGGGCCACGGCGATGACGGAAACACCCGCGACCG 

A 

1800 ESP1, 

LeuAlaGlyAlaAlalleGlySerValGlyLeuGlyLysValLeuIleAspIleLeuAla 
1802 TTAGCTGGCGCCGCCATCGGCAGTGTTGGACTGGGGAAGGTCCTCATAGACATCCTTGCA 
AATCGACCGCGGCGGTAGCCGTCACAACCTGACCCCTTCCAGGAGTATCTGTAGGAACGT 

1808 KAS1 NARI, 

GlyTyrGlyAlaGlyValAlaGlyAlaLeuValAlaPheLysIleMetSerGlyGluVal 
1862 GGGTATGGCGCGGGCGTGGCGGGAGCTCTTGTGGCATTCAAGATCATGAGCGGTGAGGTC 
CCCATACCGCGCCCGCACCGCCCTCGAGAACACCGTAAGTTCTAGTACTCGCCACTCCAG 
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1884 SACI, 1905 BSPH1, 



ProSerThrGluAspLeuValAsnLeuLeuProAlalleLeuSerProGlyAlaLeuVal 
1922 CCCTCCACGGAGGACCTGGTCAATCTACTGCCCGCCATCCTCTCGCCCGGAGCCCTCGTA 
GGGAGGTGCCTCCTGGACCAGTTAGATGACGGGCGGTAGGAGAGCGGGCCTCGGGAGCAT 

. ' A 



1934 TTH3I, 



ValGlyValValCysAlaAlalleLeuArgArgHisValGlyProGlyGluGlyAlaVal 
1982 GTCGGCGTGGTCTGTGCAGCAATACTGCGCCGGCACGTTGGCCCGGGCGAGGGGGCAGTG 
CAGCCGCACCAGACACGTCGTTATGACGCGGCCGTGCAACCGGGCCCGCTCCCCCGTCAC 



2010 NAEI, 2023 SMAI XMAI, 



GlnTrpMetAsnArgLeuIleAlaPheAlaSerArgGlyAsnHisValSerProThrHis 
204 2 CAGTGGATGAACCGGCTGATAGCCTTCGCCTCCCGGGGGAACCATGTTTCCCCCACGCAC 
GTCACCTACTTGGCCGACTATCGGAAGCGGAGGGCCCCCTTGGTACAAAGGGGGTGCGTG 



2073 SMAI XMAI, 2099 DRA3, 



TyrValProGluSerAspAlaAlaAlaArgValThrAlalleLeuSerSerLeuThrVal 
2102 TACGTGCCGGAGAGCGATGCAGCTGCCCGCGTCACTGCCATACTCAGCAGCCTCACTGTA 
ATGCACGGCCTCTCGCTACGTCGACGGGCGCAGTGACGGTATGAGTCGTCGGAGTGACAT 



2121 PVU2, 



ThrGlnLeuLeuArgArgLeuHisGlnTrpIleSerSerGluCysThrThrProCysSer 
2162 ACCCAGCTCCTGAGGCGACTGCACCAGTGGATAAGCTCGGAGTGTACCACTCCATGCTCC 
TGGGTCGAGGACTCCGCTGACGTGGTCACCTATTCGAGCCTCACATGGTGAGGTACGAGG 



2165 ALWN1, 2170 MST2, 



GlySerTrpLeuArgAspIleTrpAspTrpIleCysGluValLeuSerAspPheLysThr 
2222 GGTTCCTGGCTAAGGGACATCTGGGACTGGATATGCGAGGTGTTGAGCGACTTTAAGACC 
CCAAGGACCGATTCCCTGTAGACCCTGACCTATACGCTCCACAACTCGCTGAAATTCTGG- 



2226 ECON1, 



TrpLeuLysAlaLysLeuMetProGlnLeuProGlylleProPheValSerCysGlnArg 
2282 TGGCTAAAAGCTAAGCTCATGCCACAGCTGCCTGGGATCCCCTTTGTGTCCTGCCAGCGC 
ACCGATTTTCGATTCGAGTACGGTGTCGACGGACCCTAGGGGAAACACAGGACGGTCGCG 

A A /S 

2291 ESP1, 2306 PVU2, 2316 BAMHI, 

GlyTyrLysGlyValTrpArgGlyAspGlylleMetHisThrArgCysHisCysGlyAla 
2342 GGGTATAAGGGGGTCTGGCGAGGGGACGGCATCATGCACACTCGCTGCCACTGTGGAGCT 
CCCATATTCCCCCAGACCGCTCCCCTGCCGTAGTACGTGTGAGCGACGGTGACACCTCGA 

GluIleThrGlyHisValLysAsnGlyThrMetArglleValGlyProArgThrCysArg 
2402 GAGATCACTGGACATGTCAAAAACGGGACGATGAGGATCGTCGGTCCTAGGACCTGCAGG 
CTCTAGTGACCTGTACAGTTTTTGCCCTGCTACTCCTAGCAGCCAGGATCCTGGACGTCC 

A AAA 

2431 BSAB1, 2447 AVR2, 2454 SSE83871, 2455 PSTI, 

AsnMetTrpSerGlyThrPheProIleAsnAlaTyrThrThrGlyProCysThrProLeu 
24 62 AACATGTGGAGTGGGACCTTCCCCATTAATGCCTACACCACGGGCCCCTGTACCCCCCTT 
TTGTACACCTCACCCTGGAAGGGGTAATTACGGATGTGGTGCCCGGGGACATGGGGGGAA 
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2486 ASE1, 2503 APAI, 

E^roAlaProAsnTyrThrPheAlaLeuTrpArgValSerAlaGluGluTyrValGluIle 

2522 CCTGCGCCGAACTACACGTTCGCGCTATGGAGGGTGTCTGCAGAGGAATACGTGGAGATA 

GGACGCGGCTTGATGTGCAAGCGCGATACCTCCCACAGACGTCTCCTTATGCACCTCTAT 

a 

2559 PSTI, 

ArgGlnValGlyAspPheHisTyrValThrGlyMetThrThrAspAsnLeuLysCysPro 

2582 AGGCAGGTGGGGGACTTCCACTACGTGACGGGTATGACTACTGACAATCTT AAATGCCCG 

TCCGTCCACCCCCTGAAGGTGATGCACTGCCCATACTGATGACTGTTAGAATTTACGGGC 
a 

2600 DRA3, 

CysGlnValProSerProGluPhePheThrGluLeuAspGlyValArgLeuHisArgPhe 
2642 TGCCAGGTCCCATCGCCCGAATTTTTCACAGAATTGGACGGGGTGCGCCTACATAGGTTT 
ACGGTCCAGGGTAGCGGGCTTAAAAAGTGTCTTAACCTGCCCCACGCGGATGTATCCAAA 

AlaProProCysLysProLeuLeuArgGluGluValSerPheArgValGlyLeuHisGlu 
2702 GCGCCCCCCTGCAAGCCCTTGCTGCGGGAGGAGGTATCATTCAGAGTAGGACTCCACGAA 
CGCGGGGGGACGTTCGGGAACGACGCCCTCCTCCATAGTAAGTCTCATCCTGAGGTGCTT 

TyrProValGlySerGlnLeuProCysGluProGXuProAspValAlaValLeuThrSer 
27 62 TACCCGGTAGGGTCGCAATTACCTTGCGAGCCCGAACCGGACGTGGCCGTGTTGACGTCC 
ATGGGCCATCCCAGCGTTAATGGAACGCTCGGGCTTGGCCTGCACCGGCACAACTGCAGG 

2763 HGIE2, 2815 AAT2, 

MetLeuThrAspProSerHisIleThrAlaGluAlaAlaGlyArgArgLeuAlaArgGly 
2822 ATGCTCACTGATCCCTCCCAT ATAACAGCAGAGGCGGCCGGGCGAAGGTTGGCGAGGGGA 
TACGAGTGACTAGGGAGGGTATATTGTCGTCTCCGCCGGCCCGCTTCCAACCGCTCCCCT 

2856 EAG1 XMA3, 

SerProProSerValAlaSerSerSerAlaSerGlnLeuSerAlaProSerLeuLysAla 
2882 TCACCCCCCTCTGTGGCCAGCTCCTCGGCTAGCCAGCT ATCCGCTCCATCTCTCAAGGCA 
AGTGGGGGGAGACACCGGTCGAGGAGCCGATCGGTCGATAGGCGAGGTAGAGAGTTCCGT 

A A 

2895 BALI, 2909 NHEI, 

ThrCysThrAlaAsnHisAspSerProAspAlaGluLeuIleGluAlaAsnLeuLeuTrp 
2942 ACTTGCACCGCTAACCATGACTCCCCTGATGCTGAGCTCATAGAGGCCAACCTCCTATGG 
TGAACGTGGCGATTGGTACTGAGGGGACTACGACTCGAGTATCTCCGGTTGGAGGATACC 

A A 

2972 ESP1, 2975 SACI, 

ArgGlnGIuMetGlyGlyAsnlleThrArgValGluSerGluAsnLysValVallleLeu 
3002 AGGCAGGAGATGGGCGGCAACATCACCAGGGTTGAGTCAGAAAACAAAGTGGTGATTCTG 
TCCGTCCTCTACCCGCCGTTGTAGTGGTCCCAACTCAGTCTTTTGTTTCACCACTAAGAC 

AspSerPheAspProLeuValAlaGluGluAspGluArgGluIleSerValProAlaGlu 
3062 GACTCCTTCGATCCGCTTGTGGCGGAGGAGGACGAGCGGGAGATCTCCGTACCCGCAGAA 
CTGAGGAAGCTAGGCGAACACCGCCTCCTCCTGCTCGCCCTCTAGAGGCATGGGCGTCTT 

3102 BGL2, 
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IleLeuArgLysSerArgArgPheAlaGlnAlaLeuProValTrpAlaArgProAspTyr 
3122 ATCCTGCGGAAGTCTCGGAGATTCGCCCAGGCCCTGCCCGTTTGGGCGCGGCCGGACTAT 
TAGGACGCCTTCAGAGCCTCTAAGCGGGTCCGGGACGGGCAAACCCGCGCCGGCCTGATA 

* A 

3149 ALWN1, 3170 EAG1 XMA3, 

AsnProProLeuValGluThrTrpLysLysProAspTyrGluProProValValHisGly 
3182 AACCCCCCGCTAGTGGAGACGTGGAAAAAGCCCGACTACGAACCACCTGTGGTCCATGGC 
TTGGGGGGCGATCACCTCTGCACCTTTTTCGGGCTGATGCTTGGTGGACACCAGGTACCG 

3223 HGIE2, 3235 NCOI, 

CysProLeuProProProLysSerProProValProProProArgLysLysArgThrVal 
3242 TGCCCGCTTCCACCTCCAAAGTCCCCTCCTGTGCCTCCGCCTCGGAAGAAGCGGACGGTG 
ACGGGCGAAGGTGGAGGTTTCAGGGGAGGACACGGAGGCGGAGCCTTCTTCGCCTGCCAC 

ValLeuThrGluSerThrLeuSerThrAlaLeuAlaGluLeuAlaThrArgSerPheGly 
3302 GTCCTCACTGAATCAACCCTATCTACTGCCTTGGCCGAGCTCGCCACCAGAAGCTTTGGC 
CAGGAGTGACTTAGTTGGGATAGATGACGGAACCGGCTCGAGCGGTGGTCTTCGAAACCG 

A /% 

3338 SACI, 3352 HIND3, 

SerSerSerThrSerGlylleThrGlyAspAsnThrThrThrSerSerGluProAlaPro 
3362 AGCTCCTCAACTTCCGGCATTACGGGCGACAATACGACAACATCCTCTGAGCCCGCCCCT 
TCGAGGAGTTGAAGGCCGTAATGCCCGCTGTTATGCTGTTGTAGGAGACTCGGGCGGGGA 

SerGlyCysProProAspSerAspAlaGluSerTyrSerSerMetProProLeuGluGly 

3422 TCTGGCTGCCCCCCCGACTCCGACGCTGAGTCCTATTCCTCCATGCCCCCCCTGGAGGGG 

AGACCGACGGGGGGGCTGAGGCTGCGACTCAGGATAAGGAGGTACGGGGGGGACCTCCCC 

a 

3443 EAM11051, 

GluProGlyAspProAspLeuSerAspGlySerTrpSerThrValSerSerGluAlaAsn 
34 82 GAGCCTGGGGATCCGGATCTTAGCGACGGGTCATGGTCAACGGTCAGTAGTGAGGCCAAC 
CTCGGACCCCTAGGCCTAGAATCGCTGCCCAGTACCAGTTGCCAGTCATCACTCCGGTTG 

A A A 

3490 BAMHI, 3491 BSAB1, 3493 BSPE1, 

AlaGluAspValValCysCysSerMetSerTyrSerTrpThrGlyAlaLeuValThrPro 
3542 GCGGAGGATGTCGTGTGCTGCTCAATGTCTTACTCTTGGACAGGCGCACTCGTCACCCCG 
CGCCTCCTACAGCACACGACGAGTTACAGAATGAGAACCTGTCCGCGTGAGCAGTGGGGC 

A 

3595 DRA3, 

CysAlaAlaGluGluGlnLysLeuProIleAsnAlaLeuSerAsnSerLeuLeuArgHis 
3602 TGCGCCGCGGAAGAACAGAAACTGCCCATCAATGCACTAAGCAACTCGTTGCTACGTCAC 
ACGCGGCGCCTTCTTGTCTTTGACGGGTAGTTACGTGATTCGTTGAGCAACGATGCAGTG 

A A A 

3606 SAC 2, 3617 ALWN1, 3661 PFLM1, 

HisAsnLeuValTyrSerThrThrSerArgSerAlaCysGlnArgGlnLysLysValThr 
3662 CACAATTTGGTGTATTCCACCACCTCACGCAGTGCTTGCCAAAGGCAGAAGAAAGTCACA 
GTGTTAAACCACATAAGGTGGTGGAGTGCGTCACGAACGGTTTCCGTCTTCTTTCAGTGT 

A 

3687 DRA3, 

PheAspArgLeuGlnValLeuAspSerHisTyrGlnAspValL uLysGluValLysAla 
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3722 TTTGACAGACTGCAAGTTCTGGACAGCCATTACCAGGACGTACTCAAGGAGGTTAAAGCA 
AAACTGTCTGACGTTCAAGACCTGTCGGTAATGGTCCTGCATGAGTTCCTCCAATTTCGT 

' AlaAiaSerLysValLysAlaAsnLeuLeuSerValGluGluAlaCysSerLeuThrPro 
3782 GCGGCGTCAAAAGTGAAGGCTAACTTGCTATCCGTAGAGGAAGCTTGCAGCCTGACGCCC 
CGCCGCAGTTTTCACTTCCGATTGAACGATAGGCATCTCCTTCGAACGTCGGACTGCGGG 

A 

3822 HIND3, 

ProHisSerAlaLysSerLysPheGlyTyrGlyAlaLysAspValArgCysHisAlaArg 
3842 CCACACTCAGCCAAATCCAAGTTTGGTTATGGGGCAAAAGACGTCCGTTGCCATGCCAGA 
GGTGTGAGTCGGTTTAGGTTCAAACCAATACCCCGTTTTCTGCAGGCAACGGTACGGTCT 

3881-AAT2, 3896 BGLI , 

LysAlaValThrHisIl^AsnSerValTrpLysAspLeuLeuGluAspAsnValThrPro 
3902 AAGGCCGTAACCCACATCAACTCCGTGTGGAAAGACCTTCTGGAAGACAATGTAACACCA 
TTCCGGCATTGGGTGTAGTTGAGGCACACCTTTCTGGAAGACCTTCTGTTACATTGTGGT 

IleAspThrThrlleMetAlaLysAsnGluVaLPheCysValGlnProGluLysGlyGly 
3962 ATAGACACTACCATCATGGCTAAGAACGAGGTTTTCTGCGTTCAGCCTGAGAAGGGGGGT 
TATCTGTGATGGTAGTACCGATTCTTGCTCCAAAAGACGCAAGTCGGACTCTTCCCCCCA 

ArgLysProAlaArgLeuIleValPheProAspLeuGlyValArgValCysGluLysMet 
4 022 CGTAAGCCAGCTCGTCTCATCGTGTTCCCCGATCTGGGCGTGCGCGTGTGCGAAAAGATG 
GCATTCGGTCGAGCAGAGTAGCACAAGGGGCTAGACCCGCACGCGCACACGCTTTTCTAC 

AlaLeuTyrAspValValThrLysLeuProLeuAlaValMetGlySerSerTyrGlyPhe 
4 082 GCTTTGTACGACGTGGTTACAAAGCTCCCCTTGGCCGTGATGGGAAGCTCCT ACGGATTC 
CGAAACATGCTGCACCAATGTTTCGAGGGGAACCGGCACTACCCTTCGAGGATGCCTAAG 

GlnTyrSerProGiyGlnArgValGluPheLeuValGlnAlaTrpLysSerLysLysThr 
4142 CAATACTCACCAGGACAGCGGGTTGAATTCCTCGTGCAAGCGTGGAAGTCCAAGAAAACC 
GTTATGAGTGGTCCTGTCGCCCAACTTAAGGAGCACGTTCGCACC7TCAGGTTCTTTTGG 

A 

4166 ECORI, 

ProMetGlyPheSerTyrAspThrArgCysPheAspSerThrValThrGluSerAspIle 
4202 CCAATGGGGTTCTCGTATGATACCCGCTGCTTTGACTCCACAGTCACTGAGAGCGACATC 

GGTTACCCCAAGAGCATACTATGGGCGACGAAACTGAGGTGTCAGTGACTCTCGCTGTAG 

: r * a 

4235 DRD1, 4242 ALWN1, 

ArgThrGluGluAlalleTyrGlnCysCysAspLeuAspProGlnAlaArgValAlalle 
4262 CGTACGGAGGAGGCAATCTACCAATGTTGTGACCTCGACCCCCAAGCCCGCGTGGCCATC 
GCATGCCTCCTCCGTTAGATGGTTACAACACTGGAGCTGGGGGTTCGGGCGCACCGGTAG 

A A 

4307 BGLI , 4314 BALI, 

LysSerLeuThrGluArgLeuTyrValGIyGlyProLeuThrAsnSerArgGlyGluAsn 
4 322 AAGTCCCTCACCGAGAGGCTTTATGTTGGGGGCCCTCTTACCAATTCAAGGGGGGAGAAC 
TTCAGGGAGTGGCTCTCCGAAATACAACCCCCGGGAGAATGGTTAAGTTCCCCCCTCTTG 

A 

4 351 APAI, 

CysGlyTyrArgArgCysArgAlaSerGlyValLeuThrThrSerCysGlyAsnThrLeu 
4382 TGCGGCTATCGCAGGTGCCGCGCGAGCGGCGTACTGACAACTAGCTGTGGTAACACCCTC 
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ACGCCGATAGCGTCCACGGCGCGCTCGCCGCATGACTGTTGATCGACACCATTGTGGGAG 

ThrCysTyrlleLysAIaArgAlaAlaCysArgAlaAlaGlyLeuGlnAspCysrhrMet 

4 442 ACTTGCTACATCAAGGCCCGGGCAGCCTGTCGAGCCGCAGGGCTCCAGGACTGCACCATG 

TGAACGATGTAGTTCCGGGCCCGTCGGACAGCTCGGCGTCCCGAGGTCCTGACGTGGTAC 
a 

4458 SMAI XMAI, 

LeuValCysGlyAspAspLeuValVallleCysGluSerAlaGlyValGlnGluAspAla 
4 502 CTCGTGTGTGGCGACGACTTAGTCGTTATCTGTGAAAGCGCGGGGGTCCAGGAGGACGCG 
GAGCACACACCGCTGCTGAATCAGCAATAGACACTTTCGCGCCCCCAGGTCCTCCTGCGC 

A A 

4514 DRD1, 4517 TTH3I, 

AlaSerLeuArgAlaPheThrGluAlaMetThrArgTyrSerAlaProProGlyAspPro 
4 562 GCGAGCCTGAGAGCCTTCACGGAGGCTATGACCAGGTACTCCGCCCCCCCTGGGGACCCC 
CGCTCGGACTCTCGGAAGTGCCTCCGATACTGGTCCATGAGGCGGGGGGGACCCCTGGGG 

ProGlnProGluTyrAspLeuGluLeuIleThrSerCysSerSerAsnValSerValAla 
4 622 CCACAACCAGAATACGACTTGGAGCTCATAACATCATGCTCCTCCAACGTGTCAGTCGCC 
GGTGTTGGTCTTATGCTGAACCTCGAGTATTGTAGTACGAGGAGGTTGCACAGTCAGCGG 

4643 SACI, 

HisAspGlyAlaGlyLysArgValTyrTyrLeuThrArgAspProThrThrProLeuAla 

4 682 C ACGACGGCGCTGGAAAGAGGGTCTACT ACCTCACCCGTGACCCTACAACCCCCCTCGCG 

GTGCTGCCGCGACCTTTCTCCCAGATGATGGAGTGGGCACTGGGATGTTGGGGGGAGCGC 

a 

4737 NRUI, 

ArgAlaAlaTrpGluThrAlaArgHisThrProValAsnSerTrpLeuGlyAsnllelle 
4 742 AGAGCTGCGTGGGAGACAGCAAGACACACTCCAGTCAATTCCTGGCT AGGCAACATAATC 
TCTCGACGCACCCTCTGTCGTTCTGTGTGAGGTCAGTTAA^GACCGATCCGTTGTATTAG 

MetPheAlaProThrLeuTrpAlaArgMetlleLeuMetThrHisPhePheSerValLeu 
4802 ATGTTTGCCCCCACACTGTGGGCGAGGATGATACTGATGACCCATTTCTTTAGCGTCCTT 
TACAAACGGGGGTGTGACACCCGCTCCTACTATGACTACTGGGTAAAGAAATCGCAGGAA 

A A 

4812 PFLM1, 4813 DRA3, 

IleAlaArgAspGlnLeuGluGlnAlaLeuAspCysGlulieTyrGlyAlaCysTyrSer 
4 862 ATAGCCAGGGACCAGCTTGAACAGGCCCTCGATTGCGAGATCTACGGGGCCTGCTACTCC 
TATCGGTCCCTGGTCGAACTTGTCCGGGAGCTAACGCTCTAGATGCCCCGGACGATGAGG 

A 

4899 BGL2, . 

**. 

IleGluProLeuAspLeuProProIlelleGlnArgLeuHisGlyLeuSerAlaPheSer 
4922 ATAGAACCACTGGATCTACCTCCAATCATTCAAAGACTCCATGGCCTCAGCGCATTTTCA 
T ATC T TGGTGACCTAGATGGAGGTTAG T AAGTTTCTGAGGTACCGGAGTCGCGT AAAAGT 

A 

4 960 NCOI, 

LeuHisSerTyrSerProGlyGluIleAanArgValAlaAlaCysLeuArgLysLeuGly 
4 982 CTCCACAGTTACTCTCCAGGTGAAATCAATAGGGTGGCCGCATGCCTCAGAAAACTTGGG 
GAGGTGTCAATGAGAGGTCCACTTTAGTTATCCCACCGGCGTACGGAGTCTTTTGAACCC 

A A 

5021 SPHI, 5041 KPNI, 
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ValProProLeuArgAlaTrpArgHisArgAlaArgSerValArgAlaArgL uLeuAla 
5042 GTACCGCCCTTGCGAGCTTGGAGACACCGGGCCCGGAGCGTCCGCGCTAGGCTTCTGGCC 
CATGGCGGGAACGCTCGAACCTCTGTGGCCCGGGCCTCGCAGGCGCGATCCGAAGACCGG 

A A 

5070 APAI, 5097 BALI, 

ArgGlyGlyArgAlaAlalleCysGlyLysTyrLeuPheAsnTrpAlaValArgThrLys 
5102 AGAGGAGGCAGGGCTGCCATATGTGGCAAGTACCTCTTCAACTGGGCAGTAAGAACAAAG 
TCTCCTCCGTCCCGACGGTATACACCGTTCATGGAGAAGTTGACCCGTCATTCTTGTTTC 

A 

5119 NDEI, 

LeuLysLeuThrProIleAlaAlaAlaGlyGlnLeuAspLeuSerGlyTrpPheThrAla 
5 1 62 CTCAAACTCACTCCAATAGCGGCCGCTGGCCAGCTGGACTTGTCCGGCTGGTTCACGGCT 
GAGTTTGAGTGAGGTTATCGCCGGCGACCGGTCGACCTGAACAGGCCGACCAAGTGCCGA 

A A A A 

5180 NOT I, 5181 EAG1 XMA3, 5188 BALI , 5192 PVU2, 

GlyTyrSerGlyGlyAspIleTyrHisSerValSerHisAlaArgProArgTrpIleTrp 
5222 GGCTACAGCGGGGGAGACATTTATCACAGCGTGTCTCATGCCCGGCCCCGCTGGATCTGG 
CCGATGTCGCCCCCTCTGTAAATAGTGTCGCACAGAGTACGGGCCGGGGCGACCTAGACC 

A 

5246 DRA3, 

PheCysLeuLeuLeuLeuAlaAlaGlyValGlylleTyrLeuLeuProAsnArgOP 
5282 TTTTGCCTACTCCTGCTTGCTGCAGGGGTAGGCATCT ACCTCCTCCCCAACCGATGAAGG 
AAAACGGATGAGGACGAACGACGTCCCCATCCGTAGATGGAGGAGGGGTTGGCTACTTCC 

A 

5301 PSTI, 5331 HGIE2, 



5342 TTGGGGTAAACACTCCGGCCTAAAAAAAAAAAAAAATCTAGAACCCGAGTCGAC 
AACCCGATTTGTGAGGCCGGATTTTTTTTTTTTTTTAGATCTTGGGCTCAGCTG 

a /\ 

5378 XBAI, 5390 SALI, 
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MetAlaAlaTyrAlaAlaGlnGlyTyrLysValLeuVaiLeuAsn 
2 AGCTTACAAAACAAAATGGCTGCATATGCAGCTCAGGGCTATAAGGTGCTAGTACTCAAC 
TCGAATGTTTTGTTTTACCGACGTATACGTCGAGTCCCGATATTCCACGATCATGAGTTG 

A A A 

1 HIND3, 24 NDEI, 52 SCAI, 

ProSerValAlaAlaThrLeuGlyPheGlyAlaTyrMetSerLysAlaHisGIylieAsp 
62 CCCTCTGTTGCTGCAACACTGGGCTTTGGTGCTTACATGTCCAAGGCTCATGGGATCGAT 
GGGAGACAACGACGTTGTGACCCGAAACCACGAATGTACAGGTTCCGAGTACCCTAGCTA 

A 

116 CLAI, 

PrdAsnlleArgThrGlyValArgThrlleThrThrGlySerProIleThrTyrSerThr 
122 CCTAACATCAGGACCGGGGTGAGAACAATTACCACTGGCAGCCCCATCACGTACTCCACC 
GGATTGTAGTCCTGGCCCCACTCTTGTTAATGGTGACCGTCGGGGTAGTGCATGAGGTGG 

TyrGlyLysPheLeuAlaAspGlyGlyCysSerGlyGlyAlaTyrAspIlelielleCys 
1 8 2 TACGGCAAGTTCCTTGCCGACGGCGGGTGCTCGGGGGGCGCTTATGACATAATAATTTGT 
ATGCCGTTCAAGGAACGGCTGCCGCCCACGAGCCCCCCGCGAATACTGTATTATTAAACA 

AspGluCysHisSerThrAspAlaThrSerlleLeuGlylleGlyThrValLeuAspGln 
242 GACGAGTGCCACTCCACGGATGCCACATCCATCTTGGGCATTGGCACTGTCCTTGACCAA 
CTGCTCACGGTGAGGTGCCTACGGTGTAGGTAGAACCCGTAACCGTGACAGGAACTGGTT 

AlaGluThrAlaGlyAlaArgLeuValValLeuAlaThrAlaThrProProGlySerVal 
302 GCAGAGACTGCGGGGGCGAGACTGGTTGTGCTCGCCACCGCCACCCCTCCGGGCTCCGTC 
CGTCTCTGACGCCCCCGCTCTGACCAACACGAGCGGTGGCGGTGGGGAGGCCCGAGGCAG 

303 ALWN1, 

ThrValProHisProAsnlleGluGluValAlaLeuSerThrThrGlyGluIieProPhe 
36*2 ACTGTGCCCCATCCCAACATCGAGGAGGTTGCTCTGTCCACCACCGGAGAGATCCCTTTT 
TGACACGGGGTAGGGTTGTAGCTCCTCCAACGAGACAGGTGGTGGCCTCTCTAGGGAAAA 

TyrGlyLysAlalleProLeuGluVallleLysGlyGlyArgHisLeuIlePheCysHis 
4 22 TACGGCAAGGCTATCCCCCTCGAAGTAATCAAGGGGGGGAGACATCTCATCTTCTGTCAT 
ATGCCGTTCCGATAGGGGGAGCTTCATTAGTTCCCCCCCTCTGTAGAGTAGAAGACAGTA 

SerLysLysLysCysAspGluLeuAlaAlaLysLeuValAlaLeuGlylleAsnAlaVal 
4 82 TCAAAGAAGAAGTGCGACGAACTCGCCGCAAAGCTGGTCGCATTGGGCATCAATGCCGTG 
AGTTTCTTCTTCACGCTGCTTGAGCGGCGTTTCGACCAGCGTAACCCGTAGTTACGGCAC 

AlaTyrTyrArgGlyLeuAspValSerVallleProThrSerGlyAspValValValVal 
54 2 GCCT ACTACCGCGGTCTTGACGTGTCCGTCATCCCGACCAGCGGCGATGTTGTCGTCGTG 
CGGATGATGGCGCCAGAACTGCACAGGCAGTAGGGCTGGTCGCCGCTACAACAGCAGCAC 

A A 

550 SAC2, 560 DRD1, 

AlaThrAspAlaLeuMetThrGlyTyrThrGlyAspPheAspSerVallleAspCysAsn 
602 GCAACCGATGCCCTCATGACCGGCTATACCGGCGACTTCGACTCGGTGATAGACTGCAAT 
CGTTGGCTACGGGAGTACTGGCCGATATGGCCGCTGAAGCTGAGCCACTATCTGACGTTA 

A 

615 BSPH1 . 
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662 ACGTGTGTCACCCAGACAGTCGATTTCAGCCTTGACCCTACCTTCACCA7fGAGAC.V-.7r 
TGCACACAGTGGGTCTGTCAGCTAAAGTCGGAACTGGGATGGAAGTGGTAACTCTG77AC- 

ThrLeuProGlnAspAlaValSerArgThrGlnArgArgGlyArgThrGlyArgGIyLys 
722. ACGCTCCCCCAAGATGCTGTCTCCCGCACTCAACGTCGGGGCAGGACTGGCAGGGGGAAG 
TGCGAGGGGGTTCTACGACAGAGGGCGTGAGTTGCAGCCCCGTCCTGACCGTCCCCCTTC 

ProGlylleTyrArgPheValAlaProGlyGluArgProSerGlyMetPheAspSerSer 
7 S 2 CCAGGCATCTACAGATTTGTGGCACCGGGGGAGCGCCCCTCCGGCATGTTCGACTCGTCC 
GGTCCGTAGATGTCTAAACACCGTGGCCCCCTCGCGGGGAGGCCGTACAAGCTGAGCAGG 

816 3GLI, 833 DRD1 , 

ValLeuCysGluCysTyrAspAlaGlyCysAlaTrpTyrGluLeuThrProAlaGluThr 
842 GTCCTCTGTGAGTGCTATGACGCAGGCTGTGCTTGGTATGAGCTCACGCCCGCCGAGACT 
CAGGAGACACTCACGATACTGCGTCCGACACGAACCATACTCGAGTGCGGGCGGCTCTGA 

881 SACI, 

ThrValArgLeuArgAlaTyrMetAsnThrProGlyLeuProValCysGlnAspHisLeu 
902 ACAGTTAGGCTACGAGCGTACATGAACACCCCGGGGCTTCCCGTGTGCCAGGACCATCTT 
TGTCAATCCGATGCTCGCATGTACTTGTGGGGCCCCGAAGGGCACACGGTCCTGGTAGAA 

931 SMAI XMAI, 

GluPheTrpGluGlyValPheThrGlyLeuThrHisIleAspAlaHisPheLeuSerGln 
962 GAATTTTGGGAGGGCGTCTTTACAGGCCTCACTCATATAGATGCCCACTTTCTATCCCAG 
CTTAAAACCCTCCCGCAGAAATGTCCGGAGTGAGTATATCTACGGGTGAAAGATAGGGTC 

985 STUI, 

ThrLysGlnSerGlyGluAsnLeuProTyrLeuValAlaTyrGlnAlaThrValCysAla 
1022 AC AAAGCAGAGTGGGGAGAACCTTCCTT ACCTGGT AGCGT ACCAAGCCACCGTGTGCGCT 
TGTTTCGTCTCACCCCTCTTGGAAGGAATGGACCATCGCATGGTTCGGTGGCACACGCGA 

1069 DRA3, 

ArgAlaGlnAlaProProProSerTrpAspGlnMetTrpLysCysLeuIleArgLeuLys 
1082 AGGGCTCAAGCCCCTCCCCCATCGTGGGACCAGATGTGGAAGTGTTTGATTCGCCTCAAG 
TCCCGAGTTCGGGGAGGGGGTAGCACCCTGGTCTACACCTTCACAAACTAAGCGGAGTTC 

ProThrLeuHisGlyProThrProLeuLeuTyrArgLeuGlyAlaValGlnAsnGluIle 
- * 114 2 CCCACCCTCCATGGGCCAACACCCCTGCTATACAGACTGGGCGCTGTTCAGAATGAAATC 
GGGTGGGAGGTACCCGGTTGTGGGGACGATATGTCTGACCCGCGACAAGTCTTACTTTAG 

1150 NCOI, 

ThrLeuThrHisProValThrLysTyrlleMetThrCysMetSerAlaAspLeuGluVal 
1202 ACCCTGACGCACCCAGTCACCAAATACATCATGACATGCATGTCGGCCGACCTGGAGGTC 
TGGGACTGCGTGGGTCAGTGGTTTATGTAGTACTGTACGTACAGCCGGCTGGACCTCCAG 

A A ^ A A 

1230 BSPH1, 1234 DRD1, 1237 AVA3, 1245 EAG1 XMA3, 1250 DRD1 , 



ValThrSerThrTrpValLeuValGlyGlyValLeuAlaAlaLeuAlaAlaTyrCysLeu 
1262 GTCACGAGCACCTGGGTGCTCGTTGGCGGCGTCCTGGCTGCTTTGGCCGCGTATTGCCTG 
CAGTGCTCGTGGACCCACGAGCAACCGCCGCAGGACCGACGAAACCGGCGCATAACGGAC 



46/100 



WO 01/38360 



PCT7US00/32326 



FIGURE 14 - Page 3 



SerThrGlyCysValVallleValGlyArgValValLeuSerGiyLysProAiarielle 
1322 TCAACAGGCTGCGTGGTCATAGTGGGCAGGGTCGTCTTGTCCGGGAAGCCGGCAATCA7A 
AGTTGTCCGACGCACCAGTATCACCCGTCCCAGCAGAACAGGCCCTTCGGCCG77AGTA7 

' 1369 NAEI, 

ProAspArgGluValLeuTyrArgGluPheAspGluMetGluGluCysSerGlnHisLeu 
1382 CCTGACAGGGAAGTCCTCTACCGAGAGTTCGATGAGATGGAAGAGTGCTCTCAGCACTTA 
GGACTGTCCCTTCAGGAGATGGCTCTCAAGCTACTCTACCTTCTCACGAGAGTCGTGAAT 

A 

1385 DRD1, 

ProTyrlleGluGlnGlyMetMetLeuAlaGiuGlnPheLysGlnLysAlaLeuGiyLeu 
14 4 2 CCG7ACA7CGAGCAAGGGA7GATGC7CGCCGAGCAG7TCAAGCAGAAGGCCC7CGGCCTC 
GGCATGTAGCTCGTTCCCTACTACGAGCGGCTCGTCAAGTTCGTCTTCCGGGAGCCGGAG 

LeuGlnThrAlaSerArgGlnAlaGluVallleAlaProAlaValGlnThrAsnTrpGln 
1502 CTGCAGACCGCGTCCCGTCAGGCAGAGGTTATCGCCCCTGCTGTCC AGACCAACTGGCAA 
GACGTCTGGCGCAGGGCAGTCCGTCTCCAATAGCGGGGACGACAGGTCTGGTTGACCGTT 

1502 PSTI, 1507 TTH3I, 

LysLeuGluThrPheTrpAlaLysHisMetTrpAsnPhelleSerGlylleGlnTyrLeu 
1562 AAACTCGAGACCTTCTGGGCGAAGCATATGTGGAACTTCATCAGTGGGATACAATACTTG 
TTTGAGCTCTGGAAGACCCGCTTCGTATACACCTTGAAGTAGTCACCCTATGTTATGAAC 

1565 XHOI, 1586 NDEI, 

AlaGlyLeuSerThrLeuProGlyAsnProAlalleAlaSerLeuMetAlaPheThrAla 
1622 GCGGGCTTGTCAACGCTGCCTGGTAACCCCGCCATTGCTTCATTGATGGCTTTTACAGCT 
CGCCCGAACAGTTGCGACGGACCATTGGGGCGGTAACGAAGTAACTACCGAAAATGTCGA 

a * 

1643 BS7E2, 1677 ALWN1 PVU2, 

AlaValThrSerProLeuThrThrSerGlnThrLeuLeuPheAsnlleLeuGlyGlvTrD 
1682 GCTGTCACCAGCCCACTAACCACTAGCCAAACCCTCCTCTTCAACATATTGGGGGGGTGG 
CGACAGTGGTCGGGTGATTGGTGATCGGTTTGGGAGGAGAAGTTGTATAACCCCCCCACC 

ValAlaAlaGlnLeuAlaAlaProGlyAlaAlaThrAlaPheValGlyAlaGlyLeuAla 
1742 G7GGCTGCCCAGCTCGCCGCCCCCGGTGCCGCTACTGCCTTTGTGGGCGCTGGCTTAGCT 
CACCGACGGGTCGAGCGGCGGGGGCCACGGCGATGACGGAAACACCCGCGACCGAATCGA 

A 

1794 ESP1, 

GlyAlaAlalleGlySerValGlyLeuGlyLysValLeuIleAspIleLeuAlaGlyTyr 
1802 GGCGCCGCCATCGGCAGTGTTGGACTGGGGAAGGTCCTCATAGACATCCTTGCAGGGTAT 
CCGCGGCGGTAGCCGTCACAACCTGACCCCTTCCAGGAGTATCTGTAGGAACGTCCCATA 

1802 KAS1 NARI, 

GiyAlaGlyValAlaGlyAlaLeuValAlaPheLysIleMetSerGlyGluValProSer 
1862 GGCGCGGGCGTGGCGGGAGCTCTTGTGGCATTCAAGATCATGAGCGGTGAGGTCCCCTCC 
CCGCGCCCGCACCGCCCTCGAGAACACCGTAAGTTCTAGTACTCGCCACTCCAGGGGAGG 

1878 SACI, 1899 BSPH1, 
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ThrGluAspLeuValAsnLeuLeuProAlalleLeuSerProGlyAlaLeuValVaiGiy 
^922 ACGGAGGACCTGGTCAATCTACTGCCCGCCATCCTCTCGCCCGGAGCCCTCGTAG7CGGC 
TGCCTCCTGGACCAGTTAGATGACGGGCGGTAGGAGAGCGGGCCTCGGGAGCATCAGCCG 

1928 TTH3I , 

ValValCysAlaAlalleLeuArgArgHisValGlyProGlyGluGlyAlaValGInTrp 
1982 GTGGTCTGTGCAGCAAT ACTGCGCCGGCACGTTGGCCCGGGCGAGGGGGCAGTGCAGTGG 
CACCAGACACGTCGTTATGACGCGGCCGTGCAACCGGGCCCGCTCCCCCGTCACGTCACC 

A A 

2004 NASI, 2017 SMAI XMAI, 

MerAsnArgLeuIleAlaPheAlaSerArgGlyAsnHisValSerProThrHisTyrVai 
204 2 ATQAACCGGCTGATAGCCTTCGCCTCCCGGGGGAACCATGTTTCCCCCACGCACTACGTG 
TACTTGGCCGACTATCGGAAGCGGAGGGCCCCCTTGGTACAAAGGGGGTGCGTGATGCAC 

A 

2067 SMAI XMAI , 2093 DRA3, 

ProGluSerAspAlaAlaAlaArgValThrAlalleLeuSerSerLeuThrValThrGln 
2102 CCGGAGAGCGATGCAGCTGCCCGCGTCACTGCCATACTCAGCAGCCTCACTGTAACCCAG 
GGCCTCTCGCTACGTCGACGGGCGCAGTGACGGTATGAGTCGTCGGAGTGACATTGGGTC 

2115 PVU2 , 2159 ALWN1, 

LeuLeuArgArgLeuHisGlnTrpIleSerSerGluCysThrThrProCysSerGlySer 
2162 CTCCTGAGGCGACTGCACCAGTGGATAAGCTCGGAGTGTACCACTCCATGCTCCGGTTCC 
GAGGACTCCGCTGACGTGGTCACCTATTCGAGCCTCACATGGTGAGGTACGAGGCCAAGG 

A 

A 

2164 MST2, 2220 ECON1, 

TroLeuArgAspIleTrpAspTrpIleCysGluValleuSerAspPheLysThrTrpLeu 
2222 TGGCTAAGGGACATCTGGGACTGGATATGCGAGGTGTTGAGCGACTTTAAGACCTGGCTA 
ACCGATTCCCTGTAGACCCTGACCTATACGCtfCCACAACTCGCTGAAATTCTGGACCGAT 

LysAlaLysLeuMetProGlnLeuProGlylleProPheValSerCysGlnArgGiyTyr 
2292 AAAGCTAAGCTCATGCCACAGCTGCCTGGGATCCCCTTTGTGTCCTGCCAGCGCGGGTA? 

TTTCGATTCGAGTACGGTGTCGACGGACCCTAGGGGAAACACAGGACGGTCGCGCCCATA 

2285 ESPi, 2300 PVU2, 2310 BAMHI, 

LysGlyValTrpArgGlyAspGlylleMetHisThrArgCysHisCysGlyAlaGluIle 
234 2 AAGGGGGTCTGGCGAGGGGACGGCATCATGCACACTCGCTGCCACTGTGGAGCTGAGATC 
TTCCCCCAGACCGCTCCCCTGCCGTAGTACGTGTGAGCGACGGTGACACCTCGACTCTAG 

ThrGlvHisValLysAsnGlyThrMetArglleValGlyProArgThrCysArgAsnMet 
2 4 C 2 ACTGGACATGTCAAAAACGGGACGATGAGGATCGTCGGTCCTAGGACCTGCAGGAACATG 
TGACCTGTACAGTTTTTGCCCTGCTACTCCTAGCAGCCAGGATCCTGGACGTCCTTGTAC 

A A A A 

2425 BSAB1, 2441 AVR2, 2448 SSE83871, 2449 PSTI, 

TrDSerGlyThrPheProIleAsnAlaTyrThrThrGlyProCysThrProLeuProAla 
24 62 TGGAGTGGGACCTTCCCCATTAATGCCTACACCACGGGCCCCTGTACCCCCCTTCCTGCG 
ACCTCACCCTGGAAGGGGTAATTACGGATGTGGTGCCCGGGGACATGGGGGGAAGGACGC 

A A 

2480 ASE1, 2497 APAI, 

ProAsnTyrThrPheAlaLeuTrpArgValSerAlaGluGluTyrValGluIleArgGln 
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2522 CCGAACTACACGTTCGCGCTATGGAGGGTGTCTGCAGAGGAATACGTGGAGATAAGGCAC- 
GGCTTGATGTGCAAGCGCGA7ACCTCCCACAGACGTCTCCTTATGCACCTCTATTCCGTC 

2553 PSTI, 

ValGlyAspPheHisTyrValThrGlyMetThrThrAspAsnLeuLysCysProCysGlr. 
2582 GTGGGGGACTTCCACTACGTGACGGGTATGACTAC7GACAATCTTAAATGCCCGTGCCAG 
CACCCCCTGAAGGTGATGCACTGCCCATACTGATGACTGTTAGAATTTACGGGCACGGTC 

2594 DRA3, 

Val?roSerProGIuPhePheThrGluLeuAspGlyValArgLeuHisArg?heAIa?rc 
2 64 2 GTCCCATCGCCCGAATTTTTCACAGAATTGGACGGGGTGCGCCTACATAGGTTTGCGCCC 
CAGGGTAGCGGGCTTAAAAAGTGTCTTAACCTGCCCCACGCGGATGTATCCAAACGCGGG 

ProCysLysProLeuLeuArgGluGluValSerPheArgValGlyLeuHisGluTyrPro 
2 702 CCCTGCAAGCCCTTGCTGCGGGAGGAGGTATCATTCAGAGTAGGACTCCACGAATACCCG 
GGGACGTTCGGGAACGACGCCCTCCTCCATAGTAAGTCTCATCCTGAGGTGCTTATGGGC 

2757 HGIE2 , 

ValGlySerGlnLeuProCysGluProGluProAspValAlaValLeuThrSerMetLeu 
27 62 GTAGGGTCGCAATTACCTTGCGAGCCCGAACCGGACGTGGCCGTGTTGACGTCCATGCTC 
CATCCCAGCGTTAATGGAACGCTCGGGCTTGGCCTGCACCGGCACAACTGCAGGTACGAG 

2809 AAT2, 

ThrAspProSerHisIleThrAlaGluAlaAlaGlyArgArgLeuAlaArgGlySerPro 
2322 ACTGATCCCTCCCATATAACAGCAGAGGCGGCCGGGCGAAGGTTGGCGAGGGGATCACCC 
TGACTAGGGAGGGTATATTGTCGTCTCCGCCGGCCCGCTTCCAACCGCTCCCCTAGTGGG 

2850 EAG1 XMA3, 

Q 

ProSerValAlaSerSerSerAiaSerGlnLeuSerAlaProSerLeuLysAIaThrCys 
2532 CCCTCTGTGGCCAGCTCCTCGGCTAGCCAGCTATCCGCTCCATCTCTCAAGGCAACTTGC 
GGGAGACACCGGTCGAGGAGCCGATCGGTCGATAGGCGAGGTAGAGAGTTCCGTTGAACG 

2889 BALI, 2903 NHEI, 

ThrAlaAsnHisAspSerProAspAlaGluLeuIleGluAlaAsnLeuLeuTroArgGln 
294 2 ACCGCTAACCATGACTCCCCTGATGCTGAGCTCATAGAGGCCAACCTCCTATGGAGGCAG 
TGGCGATTGGTACTGAGGGGACTACGACTCGAGTATCTCCGGTTGGAGGATACCTCCGTC 

2966 ESP1, 2969 SACI, 

GluMetGlyGlyAsnlleThrArgValGluSerGluAsnLysValVallleLeuAspSer 
3002 GAGATGGGCGGCAACATCACCAGGGTTGAGTCAGAAAACAAAGTGGTGATTCTGGACTCC 
CTCTACCCGCCGTTGTAGTGGTCCCAACTCAGTCTTTTGTTTCACCACTAAGACCTGAGG 

PheAspProLeuValAlaGluGluAspGluArgGluIleSerValProAlaGluIleLeu 
3062 TTCGATCCGCTTGTGGCGGAGGAGGACGAGCGGGAGATCTCCGTACCCGCAGAAATCCTG 
AAGCTAGGCGAACACCGCCTCCTCCTGCTCGCCCTCTAGAGGCATGGGCGTCTTTAGGAC 

3096 BGL2, 

ArgLysSerArgArgPheAlaGlnAlaLeuProValTrpAlaArgProAspTyrAsnPro 
3122 CGGAAGTCTCGGAGATTCGCCCAGGCCCTGCCCGTTTGGGCGCGGCCGGACTATAACCCC 
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GCC77CAGAGCCTCTAAGCGGGTCCGGGACGGGCAAACCCGCGCCGGCCTGATA7TG3GG 

A~ A 

3143 ALWN1, 3164 EAG1 XMA3, 

• ProLeuValGluThrTrpLysLysProAspTyrGluProProValValHisGlyCysPro 
3182 CCGCTAGTGGAGACGTGGAAAAAGCCCGACTACGAACCACCTGTGGTCCATGGCTGCCCG 
GGCGATCACCTCTGCACCTTTTTCGGGCTGATGCTTGGTGGACACCAGGTACCGACGGGC 

/\ A 

3217 HGIE2, 3229 NCOI, 

LeuProProProLysSerProProValProProProArgLysLysArgThrValValLeu 
324 2 CTTCCACCTCCAAAGTCCCCTCCTGTGCCTCCGCCTCGGAAGAAGCGGACGGTGGTCCTC 
GAAGGTGGAGGTTTCAGGGGAGGACACGGAGGCGGAGCCTTCTTCGCCTGCCACCAGGAG 

ThrGluSerThrLeuSerThrAlaLeuAlaGluLeuAlaThrArgSerPheGlySerSer 
3 302 ACTGAATCAACCCTATCTACTGCCTTGGCCGAGCTCGCCACCAG AAGCTTTGGCAGCTCC 
TGACTTAGTTGGGATAGATGACGGAACCGGCTCGAGCGGTGGTCTTCGAAACCGTCGAGG 

A A 

3332 SACI, 3346 HIND3, 

SerThrSerGlylleThrGlyAspAsnThrThrThrSerSerGluProAlaPro.SerGly 
3362 TCAACTTCCGGCATTACGGGCGACAATACGACAACATCCTCTGAGCCCGCCCCTTCTGGC 
AGTTGAAGGCCGTAATGCCCGCTGTTATGCTGTTGTAGGAGACTCGGGCGGGGAAGACCG 

CysProProAspSerAspAlaGluSerTyrSerSerMetProProLeuGluGlyGluPro 
34 22 TGCCCCCCCGACTCCGACGCTGAGTCCTATTCCTCCATGCCCCCCCTGGAGGGGGAGCCT 
ACGGGGGGGCTGAGGCTGCGACTCAGGATAAGGAGGTACGGGGGGGACCTCCCCCTCGGA 

3437 EAM11051, 

GlyAspProAspLeuSerAspGlySerTrpSerThrValSerSerGluAlaAsnAlaGlu 
3432 GGGGATCCGGATCTTAGCGACGGGTCATGGTCAACGGTCAGTAGTGAGGCCAACGCGGAG 
CCCCTAGGCCTAGAATCGCTGCCCAGTACCAGTTGCCAGTCATCACTCCGGTTGCGCCTC 

A A A 

3484 BAMHI, 3485 BSAB1, 3487 BSPE1, 

AspValValCysCysSerMetSerTyrSerTrpThrGlyAlaLeuValThrProCysAla 
354 2 GATGTCGTGTGCTGCTCAATGTCTTACTCTTGGACAGGCGCACTCGTCACCCCGTGCGCC 
CTACAGCACACGACGAGTTACAGAATGAGAACCTGTCCGCGTGAGCAGTGGGGCACGCGG 

A A 

3589 DRA3, 3600 SAC2, 

AiaGIuGluGlnLysLeuProIleAsnAlaLeuSerAsnSerLeuLeuArgHisHisAsn 
36C2 GCGGAAGAACAGAAACTGCCCATCAATGCACTAAGCAACTCGTTGCTACGTCACCACAAT 
CGCCTTCTTGTCTTTGACGGGTAGTTACGTGATTCGTTGAGCAACGATGCAGTGGTGTTA 

A A 

3611 ALWN1, 3655 PFLM1 , 

LeuValTyrSerThrThrSerArgSerAlaCysGlnArgGlnLysLysValThrPheAsp 
3662 TTGGTGTATTCCACCACCTCACGCAGTGCTTGCCAAAGGCAGAAGAAAGTCACATTTGAC 
AACCACATAAGGTGGTGGAGTGCGTCACGAACGGTTTCCGTCTTCTTTCAGTGTAAACTG 

A 

3681 DRA3, 

ArgLeuGlnValLeuAspSerHisTyrGlnAspValLeuLysGluValLysAlaAlaAla 
3722 AGACTGCAAGTTCTGGACAGCCATTACCAGGACGTACTCAAGGAGGTTAAAGCAGCGGCG 
TCTGACGTTCAAGACCTGTCGGTAATGGTCCTGCATGAGTTCCTCCAATTTCGTCGCCGC 
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S^rLysValLysAlaAsnLeuLeuSerValGluGluAlaCysSerLeuThrProProHis 
37 S 2 TCAAAAGTGAAGGCTAACTTGCTATCCGTAGAG3AAGCTTGCAGCCTGACGCCCCCACAC 
AGTTTTCACTTCCGATTGAACGATAGGCATCTCCTTCGAACGTCGGACTGCGGGGGTGTG 

«. • a 

38I6HIND3, 

SerAlaLysSerLysPheGlyTyrGlyAlaLysAspValArgCysHisAlaArgLysAla 
394 2 TCAGCCAAATCCAAGTTTGGTTATGGGGCAAAAGACGTCCGTTGCCATGCCAGAAAGGCC 
AGTCGGTTTAGGTTCAAACCAATACCCCGTTTTCTGCAGGCAACGGTACGGTCTTTCCGG 

3875 AAT2 , 3890 BGLI, 

ValThrHisIleAsnSerValTrpLysAspLeuLeuGluAspAsnValThrProIleAsp 
3902 GTAACCCACATCAACTCCGTGTGGAAAGACCTTCTGGAAGACAATGTAACACCAATAGAC 
CATTGGGTGTAGTTGAGGCACACCTTTCTGGAAGACCTTCTGTTACATTGTGGTTATCTG 

ThrThrlleMetAlaLysAsnGluValPheCysValGlnProGluLysGlyGiyArgLys 
3962 ACTACCATCATGGCTAAGAACGAGGTTTTCTGCGTTCAGCCTGAGAAGGGGGGTCGTAAG 
TGATGGTAGTACCGATTCTTGCTCCAAAAGACGCAAGTCGGACTCTTCCCCCCAGCATTC 

ProAlaArgLeuIleValPheProAspLeuGlyValArgValCysGluLysMetAlaLeu 
4 022 CCAGCTCGTCTCATCGTGTTCCCCGATCTGGGCGTGCGCGTGTGCGAAAAGATGGCTTTG 
GGTCGAGCAGAGTAGCACAAGGGGCTAGACCCGCACGCGCACACGCTTTTCTACCGAAAC 

TyrAspValValThrLysLeuProLeuAlaValMetGlySerSerTyrGlyPheGlnTyr 
4 082 t ACGACGTGGTTACAAAGCTCCCCTTGGCCGTGATGGGAAGCTCCTACGGATTCCAAT AC 
ATGCTGCACCAATGTTTCGAGGGGAACCGGCACTACCCTTCGAGGATGCCTAAGGTTATG 

Se-^-oGlyGXnArgValGluPheLeuValGlnAlaTrpLysSerLysLysThrProMet 
4-42 TCACCAGGACAGCGGGTTGAATTCCTCGTGCAAGCGTGGAAGTCCAAGAAAACCCCAATG 
AGTGGTCCTGTCGCCCAACTTAAGGAGCACGTTCGCACCTTCAGGTTCTTTTGGGGTTAC 

4160 ECORI, 

GlyPheSerTyrAspThrArgCysPheAspSerThrValThrGluSerAspIIeArgThr 
4202 GGGTTCTCGTATGATACCCGCTGCTTTGACTCCACAGTCACTGAGAGCGACATCCGTACG 
CCCAAGAGCATACTATGGGCGACGAAACTGAGGTGTCAGTGACTCTCGCTGTAGGCATGC 

4229 DRD1, 4236 ALWN1, 

GIuGiuAlalleTyrGlnCysCysAspLeuAspProGlnAlaArgValAlalleLysSer 
. ' 4 262 GAGGAGGCAATCTACCAATGTTGTGACCTCGACCCCCAAGCCCGCGTGGCCATCAAGTCC 
CTCCTCCGTTAGATGGTTACAACACTGGAGCTGGGGGTTCGGGCGCACCGGTAGTTCAGG 

4301 BGLI , 4308 BALI, 

LeuThrGluArgLeuTyrValGlyGlyProLeuThrAsnSerArgGlyGluAsnCysGly 
4 322 C7CACCGAGAGGCTTTATGTTGGGGGCCCTCTTACCAATTCAAGGGGGGAGAACTGCGGC 
GAGTGGCTCTCCGAAATACAACCCCCGGGAGAATGGTTAAGTTCCCCCCTCTTGACGCCG 

At 

4345 APAI, 

TyrArgArgCysArgAlaSerGlyValLeuThrThrSerCysGlyAsnThrLeuThrCys 
4 382 T ATCGCAGGTGCCGCGCGAGCGGCGTACTGACAACTAGCTGTGGTAACACCCTCACTTGC 
ATAGCGTCCACGGCGCGCTCGCCGCATGACTGTTGATCGACACCATTGTGGGAGTGAACG 
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TyrlieLysAlaArgAlaAlaCysArgAlaAlaGIyLeuGlnAspCysThrMetLeuVal 

4 4 4 2 TACATCAAGGCCCGGGCAGCCTGTCGAGCCGCAGGGCTCCAGGACTGCACCATGCTCGTG 

ATGTAGTTCCGGGCCCGTCGGACAGCTCGGCGTCCCGAGGTCCTGACGTGGTACGAGCAC 
a 

... 4452 SMAI XMAI, 

CysGlyAspAspLeuValVallleCysGluSerAlaGlyValGlnGluAspAlaAlaSer 
4 502 TGTGGCGACGACTTAGTCGTTATCTGTGAAAGCGCGGGGGTCCAGGAGGACGCGGCGAGC 
ACACCGCTGCTGAATCAGCAATAGACACTTTCGCGCCCCCAGGTCCTCCTGCGCCGCTCG 

4508 DRDi , 4511 TTH3I, 

LeuArgAlaPheThrGluAlaMetThrArgTyrSerAlaProProGlyAspProProGIp. 
4 562 CTGAGAGCCTTCACGGAGGCTATGACCAGGTACTCCGCCCCCCCT.GGGGACCCCCCACAA 
GACTCTCGGAAGTGCCTCCGATACTGGTCCATGAGGCGGGGGGGACCCCTGGGGGGTGTT 

ProGluTyrAspLeuGluLeuIleThrSerCysSerSerAsnValSerValAlaHisAsp 
4 622 CCAGAATACGACTTGGAGCTCATAACATCATGCTCCTCCAACGTGTCAGTCGCCCACGAC 
GGTCTTATGCTGAACCTCGAGTATTGTAGTACGAGGAGGTTGCACAGTCAGCGGGTGCTG 

A 

4 637 SAC I, 

GlyAlaGlyLysArgValTyrTyrLeuThrArgAspProThrThrProLeuAlaArgAla 
4 682 GGCGCTGGAAAGAGGGTCTACTACCTCACCCGTGACCCTACAACCCCCCTCGCGAGAGCT 
CCGCGACCTTTCTCCCAGATGATGGAGTGGGCACTGGGATGTTGGGGGGAGCGCTCTCGA 

A 

4731 NRUI, 

Al aTrpGluThrAlaArgHisThrProValAsnSerTrpLeuGlyAsr.il el IeMetPhe 
4 7 4 2 GCGTGGGAGACAGCAAGACACACTCCAGTCAATTCCTGGCTAGGCAACATAATCATGTTT 
CGCACCCTCTGTCGTTCTGTGTGAGGTCAGTTAAGGACCGATCCGTTGTATTAGTACAAA 

AlaProThrLeuTrpAlaArgMetlleLeuMetThrHisPhePheSerValLeuIleAla 
4 802 GCCCCCACACTGTGGGCGAGGATGATACTGATGACCCATTTCTTTAGCGTCCTTATAGCC 
CGGGGGTGTGACACCCGCTCCTACTATGACTACTGGGTAAAGAAATCGCAGGAATATCGG 

4806 PFLM1, 4807 DRA3, 

ArgAspGlnLeuGluGlnAlaLeuAspCysGluIleTyrGiyAlaCysTyrSerlleGlu 
4 862 AGGGACCAGCTTGAACAGGCCCTCGATTGCGAGATCTACGGGGCCTGCTACTCCATAGAA 
TCCCTGGTCGAACTTGTCCGGGAGCTAACGCTCTAGATGCCCCGGACGATGAGGTATCTT 

A 

4893 BGL2, 

ProLeuAspLeuProProIIelleGlnArgLeuHisGlyLeuSerAlaPheSerLeuHis 
4 922 CCACTGGATCTACCTCCAATCATTCAAAGACTCCATGGCCTCAGCGCATTTTCACTCCAC 
GGTGACCTAGATGGAGGTTAGTAAGTTTCTGAGGTACCGGAGTCGCGTAAAAGTGAGGTG 

A 

4954 NCOI, 

SerTyrSerProGlyGluIleAsnArgValAlaAlaCysLeuArgLysLeuGlyValPro 
4 982 • AGTTACTCTCCAGGTGAAATCAATAGGGTGGCCGCATGCCTCAGAAAACTTGGGGTACCG 
TCAATGAGAGGTCCACTTTAGTTATCCCACCGGCGTACGGAGTCTTTTGAACCCCATGGC 

A A 

5015 SPHI, 5035 KPNI, 

ProLeuArgAlaTrpArgHisArgAlaArgSerValArgAlaArgLeuLeuAlaArgGly 
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5042 CCCTTGCGAGCTTGGAGACACCGGGCCCGGAGCGTCCGCGCTAGGCTTCTGGCCAGAGGA 
GGGAACGCTCGAACCTCTGTGGCCCGGGCC7CGCAGGCGCGATCCGAAGACCG3TCTCCT 

5064 APAI, 5091 BALI , 

GlyArgAlaAlalleCysGlyLysTyrLeuPheAsnTrpAlaValArgThrLysLeuLys 
5102 GGCAGGGCTGCCATATGTGGCAAGTACCTCTTCAACTGGGCAGTAAGAACAAAGCTCAAA 
CCGTCCCGACGGTATACACCGTTCATGGAGAAGTTGACCCGTCATTCTTGTTTCGAGTTT 

5113 NDEI, 

LeuThrProIleAlaAlaAlaGlyGlnLeuAspLeuSerGlyTrpPheThrAlaGiyTyr 
5162 CTCACTCCAATAGCGGCCGCTGGCCAGCTGGACTTGTCCGGCTGGTTCACGGCTGGC7AC 
GAGTGAGGTTATCGCCGGCGACCGGTCGACCTGAACAGGCCGACCAAGTGCCGACCGATG 

5174 NOT I, 5175 EAG1 XMA3, 5182 BALI , 5186 PVU2 , 

SerGlyGlyAspIleTyrHisSerValSerHisAlaArgProArgTrpIleTrpPheCys 
5222 AGCGGGGGAGACATTTATCACAGCGTGTCTCATGCCCGGCCCCGCTGGATCTGGTTTTGC 
TCGCCCCCTCTGTAAATAGTGTCGCACAGAGTACGGGCCGGGGCGACCTAGACCAAAACG 

5240 DRA3, 

LeuLeuLeuLeuAlaAlaGlyValGlylleTyrLeuLeuProAsnArgOP 
5282 CTACTCCTGCTTGCTGCAGGGGTAGGCATCTACCTCCTCCCCAACCGATGAATAGTCGAC 
GATGAGGACGAACGACGTCCCCATCCGTAGATGGAGGAGGGGTTGGCTACTTATCAGCTG 

5295 PS7I, 5336 SALI, 
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MetAlaAlaTyrAlaAlaGlnGlyTyrLysValLeuValLeuAsn 
2 AGCTTACAAAACAAAATGGCTGCATATGCAGCTCAGGGCTATAAGGTGCTAGTACTCAAC 
TCGAATGTTTTGTTTTACCGACGTATACGTCGAGTCCCGATATTCCACGATCATGAGTTG 

1 HIND3, 24 NDEI , 52 SCAI, 

ProSerValAlaAlaThrLeuGlyPheGlyAlaTyrMetSerLysAlaHisGlylleAsp 
62 CCCTCTGTTGCTGCAACACTGGGCTTTGGTGCTTACATGTCCAAGGCTCATGGGATCGAT 
GGGAGACAACGACGTTGTGACCCGAAACCACGAATGTACAGGTTCCGAGTACCCTAGCTA 

116 CLAI, 

ProAsnlleArgThrGlyValArgThrlleThrThrGlySerProIleThrTyrSerThr 
122 CCTAACATCAGGACCGGGGTGAGAACAATTACCACTGGCAGCCCCATCACGTACTCCACC 
GGATTGTAGTCCTGGCCCCACTCTTGTTAATGGTGACCGTCGGGGTAGTGCATGAGGTGG 

TyrGlyLysPheLeuAlaAspGlyGlyCysSerGlyGlyAlaTyrAspIiellelleCys 
182 TACGGCAAGTTCCTTGCCGACGGCGGGTGCTCGGGGGGCGCTTATGACATAATAATTTGT 
ATGCCGTTCAAGGAACGGCTGCCGCCCACGAGCCCCCCGCGAATACTGTATTATTAAACA 

AspGluCysHisSerThrAspAlaThrSerlleLeuGlylleGlyThrValLeuAspGln 
24 2 GACGAGTGCCACTCCACGGATGCCACATCCATCTTGGGCATTGGCACTGTCCTTGACCAA 
CTGCTCACGGTGAGGTGCCTACGGTGTAGGTAGAACCCGTAACCGTGACAGGAACTGGTT 

AlaGluThrAlaGlyAlaArgLeuValValLeuAlaThrAlaThrProProGlySerVal 
302 GCAGAGACTGCGGGGGCGAGACTGGTTGTGCTCGCCACCGCCACCCCTCCGGGCTCCGTC 
CGTCTCTGACGCCCCCGCTCTGACCAACACGAGCGGTGGCGGTGGGGAGGCCCGAGGCAG 

303 ALWN1, 

ThrValProHisProAsnlleGluGluValAlaLeuSerThrThrGlyGluIleProPhe 
362 ACTGTGCCCCATCCCAACATCGAGGAGGTTGCTCTGTCCACCACCGGAGAGATCCCTTTT 
TGACACGGGGTAGGGTTGTAGCTCCTCCAACGAGACAGGTGGTGGCCTCTCTAGGGAAAA 

TyrGlyLysAlalleProLeuGluVallleLysGlyGlyArgHisLeuIlePheCysHis 
422 TACGGCAAGGCTATCCCCCTCGAAGTAATCAAGGGGGGGAGACATCTCATCTTCTGTCAT 
ATGCCGTTCCGATAGGGGGAGCTTCATTAGTTCCCCCCCTCTGTAGAGTAGAAGACAGTA 
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SerLysLysLysCysAspGlulT uAlaAlaLysLeuValAlaLeuGlylleAsnAlaVai 
462 TCAAAGAAGAAGTGCGACGAACTCGCCGCAAAGCTGGTCGCATTGGGCATCAATGCCGTG 
' AGTTTCTTCTTCACGCTGCTTGAGCGGCGTTTCGACCAGCGTAACCCGTAGTTACGGCAC 

AlaTyrTyrArgGlyLeuAspValSerVallleProThrSerGlyAspValValValVal 
542 GCCTACTACCGCGGTCTTGACGTGTCCGTCATCCCGACCAGCGGCGATGTTGTCGTCGTG 
CGGATGATGGCGCCAGAACTGCACAGGCAGTAGGGCTGGTCGCCGCTACAACAGCAGCAC 

550 SAC2, 560 DRD1, 

AlaThrAspAlaLeuMetThrGlyTyrThrGlyAspPheAspSerVailleAspCysAsn 
602 GCAACCGATGCCCTCATGACCGGCTATACCGGCGACTTCGACTCGGTGATAGACTGCAAT 
CGT-TGGCTACGGGAGTACTGGCCGATATGGCCGCTGAAGCTGAGCCACTATCTGACGTTA 

615 BSPH1, 

ThrCysValThrGlnThrValAspPheSerLeuAspProThrPheThrlleGluThrlle 
662 ACGTGTGTCACCCAGACAGTCGATTTCAGCCTTGACCCTACCTTCACCATTGAGACAATC 
TGCACACAGTGGGTCTGTCAGCTAAAGTCGGAACTGGGATGGAAGTGGTAACTCTGTTAG 

ThrLeuProGlnAspAlaValSerArgThrGlnArgArgGlyArgThrGlyArgGlyLys 
722 ACGCTCCCCCAAGATGCTGTCTCCCGCACTCAACGTCGGGGCAGGACTGGCAGGGGGAAG 
TGCGAGGGGGTTCTACGACAGAGGGCGTGAGTTGCAGCCCCGTCCTGACCGTCCCCCTTC 

ProGlylleTyrArgPheValAlaProGlyGluArgProSerGlyMetPheAspSerSer 
782 CCAGGCATCTACAGATTTGTGGCACCGGGGGAGCGCCCCTCCGGCATGTTCGACTCGTCC 
GGTCCGTAGATGTCTAAACACCGTGGCCCCCTCGCGGGGAGGCCGTACAAGCTGAGCAGG 

816 BGLI, 833 DRD1 , 

ValLeuCysGluCysTyrAspAlaGlyCysAlaTrpTyrGluLeuThrProAlaGluThr 
842 GTCCTCTGTGAGTGCTATGACGCAGGCTGTGCTTGGTATGAGCTCACGCCCGCCGAGACT 
CAGGAGACACTCACGATACTGCGTCCGACACGAACCATACTCGAGTGCGGGCGGCTCTGA 

A 

881 SACI, 

ThrValArgLeuArgAlaTyrMetAsnThrProGlyLeuProValCysGlnAspHisLeu 
902 ACAGTTAGGCTACGAGCGTACATGAACACCCCGGGGCTTCCCGTGTGCCAGGACCATCTT 
TGTCAATCCGATGCTCGCATGTACTTGTGGGGCCCCGAAGGGCACACGGTCCTGGTAGAA 

931 SMAI XMAI, 

GluPheTrpGluGlyValPheThrGlyLeuThrHisIleAspAlaHisPheLeuSerGln 
962 GAATTTTGGGAGGGCGTCTTTACAGGCCTCACTCATATAGATGCCCACTTTCTATCCCAG 
CTTAAAACCCTCCCGCAGAAATGTCCGGAGTGAGTATATCTACGGGTGAAAGATAGGGTC 

985 STUI, 

ThrLysGlnSerGlyGluAsnLeuProTyrLeuValAlaTyrGlnAlaThrValCysAIa 
1022 ACAAAGCAGAGTGGGGAGAACCTTCCTTACCTGGTAGCGT ACCAAGCCACCGTGTGCGCT 
TGTTTCGTCTCACCCCTCTTGGAAGGAATGGACCATCGCATGGTTCGGTGGCACACGCGA 

1069 DRA3, 

ArgAlaGlnAlaProProProSerTrpAspGlnMetTrpLysCysLeuIleArgLeuLys 
1082 AGGGCTCAAGCCCCTCCCCCATCGTGGGACCAGATGTGGAAGTGTTTGATTCGCCTCAAG 
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TCCCGAGTTCGGGGAGGGGGTAGCACCCTGGTCTACACCTTCACAAACTAAGCGGAGTTC 

ProThrLeuHisGlyProThrProLeuLeuTyrArgLeuGlyAlaValGInAsnGluIle 
1142 CCCACCCTCCATGGGCCAACACCCCTGCTATACAGACTGGGCGCTGTTCAGAATGAAATC 
... GGGTGGGAGGTACCCGGTTGTGGGGACGATATGTCTGACCCGCGACAAGTCTTACTTTAG 

1150 NCOI, 

ThrLeuThrHisProValThrLysTyrlleMetThrCysMetSerAlaAspLeuGluVal 
1202 ACCCTGACGCACCCAGTCACCAAATACATCATGACATGCATGTCGGCCGACCTGG AGGTC 
TGGGACTGCGTGGGTCAGTGGTTTATGTAGTACTGTACGTACAGCCGGCTGGACCTCCAG 

A, A A A A 

1230 BSPH1, 1234 DRD1, 1237 AVA3, 1245 EAG1 XMA3, 1250 DRD1, 



ValThrSerThrTrpValLeuValGlyGlyValLeuAlaAlaLeuAlaAlaTyrCysLeu 
1262 GTCACGAGC ACCTGGGTGCTCGTTGGCGGCGTCCTGGCTGCTTTGGCCGCGTATTGCCTG 
CAGTGCTCGTGGACCCACGAGCAACCGCCGCAGGACCGACGAAACCGGCGCATAACGGAC 

SerThrGlyCysValVallleValGlyArgValValLeuSerGlyLysProAlallelle 
1322 TCAACAGGCTGCGTGGTCATAGTGGGCAGGGTCGTCTTGTCCGGGAAGCCGGCAATCATA 
AGTTGTCCGACGCACCAGTATCACCCGTCCCAGCAGAACAGGCCCTTCGGCCGTTAGTAT 

A 

1369 NAEI , 

ProAspArgGluValLeuTyrArgGluPheAspGluMetGluGluCysSerGlnHisLeu 
13B2 CCTGACAGGGAAGTCCTCTACCGAGAGTTCGATGAGATGGAAGAGTGCTCTCAGCACTTA 
GGACTGTCCCTTCAGGAGATGGCTCTCAAGCTACTCTACCTTCTCACGAGAGTCGTGAAT 

A 

1385 DRD1, 

ProTyrlleGluGlnGlyMetMetLeuAlaGluGlnPheLysGlnLysAlaLeuGlyLeu 
14 4 2 CCGTACATCGAGCAAGGGATGATGCTCGCCGAGCAGTTCAAGCAGAAGGCCCTCGGCCTC 
GGCATGTAGCTCGTTCCCTACTACGAGCGGCTCGTCAAGTTCGTCTTCCGGGAGCCGGAG 

LeuGlnThrAlaSerArgGlnAlaGluVallleAlaProAlaValGlnThrAsnTrpGln 
1502 CTGCAGACCGCGTCCCGTCAGGCAGAGGTTATCGCCCCTGCTGTCCAGACCAACTGGCAA 
GACGTCTGGCGCAGGGCAGTCCGTCTCCAATAGCGGGGACGACAGGTCTGGTTGACCGTT 

/V A 

1502 PSTI, 1507 TTH3I, 

LysLeuGluThrPheTrpAlaLysHisMetTrpAsnPhelleSerGlylleGlnTyrLeu 
* 1562 AAACTCGAGACCTTCTGGGCGAAGCATATGTGGAACTTCATCAGTGGGATACAATACTTG 
TTTGAGCTCTGGAAGACCCGCTTCGTATACACCTTGAAGTAGTCACCCTATGTTATGAAC 

A. A 

1565 XHOI, 1586 NDEI, 

AlaGlyLeuSerThrLeuProGlyAsnProAlalleAlaSerLeuMetAlaPheThrAla 
1622 GCGGGCTTGTCAACGCTGCCTGGTAACCCCGCCATTGCTTCATTGATGGCTTTTACAGCT 
CGCCCGAACAGTTGCGACGGACCATTGGGGCGGTAACGAAGTAACTACCGAAAATGTCGA 

A 

164 3 BSTE2, 1677 ALWN1 PVU2, 

AlaValThrSerProLeuThrThrSerGlnThrLeuLeuPheAsnlleLeuGlyGlyTrp 
1682 GCTGTCACCAGCCCACTAACCACTAGCCAAACCCTCCTCTTCAACATATTGGGGGGGTGG 
CGACAGTGGTCGGGTGATTGGTGATCGGTTTGGGAGGAGAAGTTGTATAACCCCCCCACC 
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ValAlaAlaGlnLeuAlaAlaProGlyAlaAlaThrAlaPheValGlyAlaGIyLeuAia 
1742 GTGGCTGCCCAGCTCGCCGCCCCCGGTGCCGCTACTGCCTTTGTGGGCGCTGGCTTAGCT 

CACCGACGGGTCGAGCGGCGGGGGCCACGGCGATGACGGAAACACCCGCGACCGAATCGA 

a 

1794 ESP1, 

GlyAlaAlalleGlySerValGlyLeuGlyLysValLeuIleAspIleLeuAlaGlyTyr 

1802 GGCGCCGCCATCGGCAGTGTTGGACTGGGGAAGGTCCTCATAGACATCCTTGCAGGGT AT 

CCGCGGCGGTAGCCGTCACAACCTGACCCCTTCCAGGAGTATCTGTAGGAACGTCCCATA 
a 

1802 KAS1 NARI, 

GlyAlaGlyValAlaGlyAlaLeuValAlaPheLysIleMetSerGlyGluValProSer 
1 8 62 GGCGCGGGCGTGGCGGGAGCTCTTGTGGCATTCAAGATCATGAGCGGTGAGGTCCCCTCC 
CCGCGCCCGCACCGCCCTCGAGAACACCGTAAGTTCTAGTACTCGCCACTCCAGGGGAGG 

1878 SAC I, 1899 BSPH1, 

ThrGluAspLeuValAsnLeuLeuProAlalleLeuSerProGlyAlaLeuValValGly 
1922 ACGGAGGACCTGGTCAATCTACTGCCCGCCATCCTCTCGCCCGGAGCCCTCGTAGTCGGC 
TGCCTCCTGGACCAGTTAGATGACGGGCGGTAGGAGAGCGGGCCTCGGGAGCATCAGCCG 

A 

1928 TTH3I , 

ValValCysAlaAlalleLeuArgArgHisValGlyProGlyGluGlyAlaValGlnTrp 
1982 GTGGTCTGTGCAGCAATACTGCGCCGGCACGTTGGCCCGGGCGAGGGGGCAGTGCAGTGG 
CACCAGACACGTCGTTATGACGCGGCCGTGCAACCGGGCCCGCTCCCCCGTCACGTCACC 

A A 

2004 NAEI, 2017 SMAI XMAI, 

MetAsnArgLeuIleAlaPheAlaSerArgGlyAsnHisValSerProThrHisTyrVal 
204 2 ATGAACCGGCTGATAGCCTTCGCCTCCCGGGGGAACCATGTTTCCCCCACGCACTACGTG 
TACTTGGCCGACTATCGGAAGCGGAGGGCCCCCTTGGTACAAAGGGGGTGCGTGATGCAC 

A A 

2067 SMAI XMAI , 2093 DRA3, 

ProGluSerAspAlaAlaAlaArgValThrAlalleLeuSerSerLeuThrValThrGin 
2102 CCGGAGAGCGATGCAGCTGCCCGCGTCACTGCCATACTCAGCAGCCTCACTGTAACCCAG 
GGCCTCTCGCTACGTCGACGGGCGCAGTGACGGTATGAGTCGTCGGAGTGACATTGGGTC 

A A 

2115 PVU2, 2159 ALWN1, 

LeuLeuArgArgLeuHisGlnTrpIleSerSerGluCysThrThrProCysSerGlySer 
2162 CTCCTGAGGCGACTGCACCAGTGGATAAGCTCGGAGTGTACCACTCCATGCTCCGGTTCC 
GAGGACTCCGCTGACGTGGTCACCTATTCGAGCCTCACATGGTGAGGTACGAGGCCAAGG 

2164 MST2 , 2220 ECON1, 

TrpLeuArgAspIleTrpAspTrpIleCysGluValLeuSerAspPheLysThrTrpLeu 
2222 TGGCTAAGGGACATCTGGGACTGGATATGCGAGGTGTTGAGCGACTTTAAGACCTGGCTA 
ACCGATTCCCTGTAGACCCTGACCTATACGCTCCACAACTCGCTGAAATTCTGGACCGAT 

LysAlaLysLeuMetProGlnLeuProGlylleProPheValSerCysGlnArgGlyTyr 
2282 AAAGCTAAGCTCATGCCACAGCTGCCTGGGATCCCCTTTGTGTCCTGCCAGCGCGGGTAT 
TTTCGATTCGAGTACGGTGTCGACGGACCCTAGGGGAAACACAGGACGGTCGCGCCCATA 

A A A 

2285 ESP1, 2300 PVU2, 2310 BAMHI, 



60/100 



WO 01/38360 



PCT/US00/32326 



FIGURE 17 - Page 5 



LysGlyValTrpArgGlyAspGlylleMetHisThrArgCysHisCysGlyAlaGiulie 
2342 AAGGGGGTCTGGCGAGGGGAtGGCATCATGCACACTCGCTGCCACTGTGGAGCTGAGATC 
TTCCCCCAGACCGCTCCCCTGCCGTAGTACGTGTGAGCGACGGTGACACCTCGACTCTAG 

ThrGlyHisValLysAsnGlyThrMetArglleValGlyProArgThrCysArgAsnMet 
24 02 ACTGGACATGTCAAAAACGGGACGATGAGGATCGTCGGTCCTAGGACCTGCAGGAACATG 
TGACCTGTACAGTTTTTGCCCTGCTACTCCTAGCAGCCAGGATCCTGGACGTCCTTGTAC 

A A A A 

2425 BSAB1, 2441 AVR2, 2448 SSE83871, 2449 PSTI, 

TrpSerGlyThrPheProIleAsnAlaTyrThrThrGlyProCysThrProLeuProAia 
24 62 TGGAGTGGGACCTTCCCCATTAATGCCTACACCACGGGCCCCTGTACCCCCCTTCCTGCG 
ACCTCACCCTGGAAGGGGTAATTACGGATGTGGTGCCCGGGGACATGGGGGGAAGGACGC 

2480 ASE1, 2497 APAI, 

ProAsnTyrThrPheAlaLeuTrpArgValSerAlaGluGluTyrValGluIleArgGln 
2522 CCGAACTACACGTTCGCGCTATGGAGGGTGTCTGCAGAGGAATACGTGGAGATAAGGCAG 
GGCTTGATGTGCAAGCGCGATACCTCCCACAGACGTCTCCTTATGCACCTCTATTCCGTC 

2553 PSTI, 

ValGlyAspPheHisTyrValThrGlyMetThrThrAspAsnLeuLysCysProCysGIn 
2582 GTGGGGGACTTCCACT ACGTGACGGGTATGACTACTGACAATCTTAAATGCCCGTGCCAG 
CACCCCCTGAAGGTGATGCACTGCCCATACTGATGACTGTTAGAATTTACGGGCACGGTC 

A 

2594 DRA3, 

ValProSerProGluPhePheThrGluLeuAspGlyValArgLeuHisArgPheAlaPro 
264 2 GTCCCATCGCCCGAATTTTTCACAGAATTGGACGGGGTGCGCCTACATAGGTTTGCGCCC 
CAGGGTAGCGGGCTTAAAAAGTGTCTTAACCTGCCCCACGCGGATGTATCCAAACGCGGG 

ProCysLysProLeuLeuArgGluGluValSerPheArgValGlyLeuHisGluTyrPro 
2 7 C 2 CCCTGCAAGCCCTTGCTGCGGGAGGAGGT ATCATTC AGAGTAGGACTCCACGAAT ACCCG 
GGGACGTTCGGGAACGACGCCCTCCTCCATAGTAAGTCTCATCCTGAGGTGCTTATGGGC 

A 

2757 HGIE2, 

ValGlySerGlnLeuProCysGluProGluProAspValAlaValLeuThrSerMetLeu 
2762 GTAGGGTCGCAATTACCTTGCGAGCCCGAACCGGACGTGGCCGTGTTGACGTCCATGCTC 
CATCCCAGCGTTAATGGAACGCTCGGGCTTGGCCTGCACCGGCACAACTGCAGGTACGAG 

A 

2809 AAT2, 

ThrAspProSerHisIleThrAlaGluAlaAlaGlyArgArgLeuAlaArgGlySerPro 
2822 ACTGATCCCTCCCATATAACAGCAGAGGCGGCCGGGCGAAGGTTGGCGAGGGGATCACCC 
TGACTAGGGAGGGTATATTGTCGTCTCCGCCGGCCCGCTTCCAACCGCTCCCCTAGTGGG 

A 

2850 EAG1 XMA3, 

ProSerValAlaSerSerSerAlaSerGlnLeuSerAlaProSerLeuLysAlaThrCys 
2882 CCCTCTGTGGCCAGCTCCTCGGCTAGCCAGCT ATCCGCTCCATCTCTCAAGGCAACTTGC 
GGGAGACACCGGTCGAGGAGCCGATCGGTCGATAGGCGAGGTAGAGAGTTCCGTTGAACG 

A A 

2889 BALI , 2903 NHEI, 
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ThrAlaAsnHisAspSerProAspAlaGluLeuIl GluAlaAsnLeuLeuTrpArgGlr. 

2942 ACCGCTAACCATGACTCCCCTGATGCTGAGCTCATAGAGGCCAACCTCCTATGGAGGCAG 

TGGCGATTGGTACTGAGGGGACTACGACTCGAGTATC7CCGGTTGGAGGATACCTCCGTC 

a a 

2966 ESP1, 2969 SACI, 

GluMetGlyGlyAsnlleThrArgValGluSerGluAsnLysValVallleLeuAspSer 
3002 GAGATGGGCGGCAACATCACCAGGGTTGAGTCAGAAAACAAAGTGGTGATTCTGGACTCC 
CTCTACCCGCCGTTGTAGTGGTCCCAACTCAGTCTTTTGTTTCACCACTAAGACCTGAGG 

PheAspProLeuValAlaGluGluAspGluArgGluIleSerValProAlaGiuIieLeu 

3062 TTCGATCCGCTTGTGGCGGAGGAGGACGAGCGGGAGATCTCCGTACCCGCAGAAATCCTG 

AAGCTAGGCGAACACCGCCTCCTCCTGCTCGCCCTCTAGAGGCATGGGCGTCTTTAGGAC 

a 

3096 BGL2 , 

ArgLysSerArgArgPheAlaGlnAlaLeuProValTrpAIaArgProAspTyrAsnPro 
3122 CGGAAGTCTCGGAGATTCGCCCAGGCCCTGCCCGTTTGGGCGCGGCCGGACTATAACCCC 
GCCTTCAGAGCCTCTAAGCGGGTCCGGGACGGGCAAACCCGCGCCGGCCTGATATTGGGG 

314 3 ALWN1, 3164 EAG1 XMA3, 

ProLeuValGluThrTrpLysLysProAspTyrGluProProValValHisGlyCysPro 
3182 CCGCTAGTGGAGACGTGGAAAAAGCCCGACTACGAACCACCTGTGGTCCATGGCTGCCCG 
GGCGATCACCTCTGCACCTTTTTCGGGCTGATGCTTGGTGGACACCAGGTACCGACGGGC 

a a 

3217 HGIE2 , 3229 NCOI, 

LeuProProProLysSerProProValProProProArgLysLysArgThrValValLeu 
324 2 CTTCCACCTCCAAAGTCCCCTCCTGTGCCTCCGCCTCGGAAGAAGCGGACGGTGGTCCTC 
GAAGGTGGAGGTTTCAGGGGAGGACACGGAGGCGGAGCCTTCTTCGCCTGCCACCAGGAG 

ThrGluSerThrLeuSerThrAlaLeuAlaGluLeuAlaThrArgSerPheGlySerSer 
3302 ACTGAATCAACCCTATCTACTGCCTTGGCCGAGCTCGCCACCAGAXGCTTTGGCAGCTCC 
TGACTTAGTTGGGATAGATGACGGAACCGGCTCGAGCGGTGGTCTTCGAAACCGTCGAGG 

A A 

3332 SACI, 3346 HIND3, 

SerThrSerGlylleThrGlyAspAsnThrThrThrSerSerGluProAlaProSerGly 
3362 ' TCAACTTCCGGCATTACGGGCGACAATACGACAACATCCTCTGAGCCCGCCCCTTCTGGC 
AGTTGAAGGCCGTAATGCCCGCTGTTATGCTGTTGTAGGAGACTCGGGCGGGGAAGACCG 

CysProProAspSerAspAlaGluSerTyrSerSerMetProProLeuGluGlyGluPro 
34 22 TGCCCCCCCGACTCCGACGCTGAGTCCT ATTCCTCCATGCCCCCCCTGGAGGGGGAGCCT 
ACGGGGGGGCTGAGGCTGCGACTCAGGATAAGGAGGTACGGGGGGGACCTCCCCCTCGGA 

A 

3437 EAMU051, 

GlyAspProAspLeuSerAspGlySerTrpSerThrValSerSerGluAlaAsnAiaGlu 
34 82 GGGGATCCGGATCTTAGCGACGGGTCATGGTCAACGGTCAGTAGTGAGGCCAACGCGGAG 
CCCCTAGGCCTAGAATCGCTGCCCAGTACCAGTTGCCAGTCATCACTCCGGTTGCGCCTC 

A A A 

3484 BAMHI, 3485 BSAB1, 3487 BSPE1, 

AspValValCysCysSerMetSerTyrS rTrpThrGlyAlaLeuValThrProCysAla 
354 2 GATGTCGTGTGCTGCTCAATGTCTTACTCTTGGACAGGCGCACTCGTCACCCCGTGCGCC 
CTACAGCACACGACGAGTTACAGAATGAGAACCTGTCCGCGTGAGCAGTGGGGCACGCGG 
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3589 DRA3, 3600 SAC2, " 

AlaGluGluGlnLysLeuProIleAsnAlaLeuSerAsnSerLeuLeuArgHisHisAsn 
3602 ■ GCGGAAGAACAGAAACTGCCCATCAATGCACTAAGCAACTCGTTGCTACGTCACCACAAT 
CGCCTTCTTGTCTTTGACGGGTAGTTACGTGATTCGTTGAGCAACGATGCAGTGGTGTTA 

3611 ALWN1 , 3655 PFLM1, 

LeuValTyrSerThrThrSerArgSerAlaCysGlnArgGlnLysLysValThrPheAsp 
3662 TTGGTGTATTCCACCACCTCACGCAGTGCTTGCCAAAGGCAGAAGAAAGTCACATTTGAC 
AACCACATAAGGTGGTGGAGTGCGTCACGAACGGTTTCCGTCTTCTTTCAGTGTAAACTG 

a 

3681 DRA3, 

ArgLeuGlnValLeuAspSerHisTyrGlnAspValLeuLysGluValLysAlaAlaAla 
3722 AGACTGCAAGTTCTGGACAGCCATTACCAGGACGTACTCAAGGAGGTTAAAGCAGCGGCG 
TCTGACGTTCAAGACCTGTCGGTAATGGTCCTGCATGAGTTCCTCCAATTTCGTCGCCGC 

SerLysValLysAlaAsnLeuLeuSerValGluGluAlaCysSerLeuThrProProHis 
3782 TCAAAAGTGAAGGCTAACTTGCTATCCGTAGAGGAAGCTTGCAGCCTG ACGCCCCCACAC 
AGTTTTCACTTCCGATTGAACGATAGGCATCTCCTTCGAACGTCGGACTGCGGGGGTGTG 

A 

3816 HIND3, 

SerAlaLysSerLysPheGlyTyrGlyAlaLysAspValArgCysHisAlaArgLysAla 

3842 TCAGCCAAATCCAAGTTTGGTTATGGGGCAAAAGACGTCCGTTGCCATGCCAGAAAGGCC 

AGTCGGTTTAGGTTCAAACCAATACCCCGTTTTCTGCAGGCAACGGTACGGTCTTTCCGG 

a a 

3875 AAT2, 3890 BGLI, 

ValThrHisIleAsnSerValTrpLysAspLeuLeuGluAspAsnValThrProIleAsp 

3 902 GTAACCCACATCAACTCCGTGTGGAAAGACCTTCTGGAAGACAATGTAACACCAATAGAC 

CATTGGGTGTAGTTGAGGCACACCTTTCTGGAAGACCTTCTGTTACATTGTGGTTATCTG 

ThrThrlleMetAlaLysAsnGluValPheCysValGlnProGluLysGlyGlyArgLys 
3962 ACTACCATCATGGCTAAGAACGAGGTTTTCTGCGTTCAGCCTGAGAAGGGGGGTCGTAAG 
TGATGGTAGTACCGATTCTTGCTCCAAAAGACGCAAGTCGGACTCTTCCCCCCAGCATTC 

ProAlaArgLeuIleValPheProAspLeuGlyValArgValCysGluLysMetAlaLeu 

4 022 CCAGCTCGTCTCATCGTGTTCCCCGATCTGGGCGTGCGCGTGTGCGAAAAGATGGCTTTG 

GGTCGAGCAGAGTAGCACAAGGGGCTAGACCCGCACGCGCACACGCTTTTCTACCGAAAC 

TyrAspValValThrLysLeuProLeuAlaValMetGlySerSerTyrGlyPheGlnTyr 
4082 TACGACGTGGTTACAAAGCTCCCCTTGGCCGTGATGGGAAGCTCCTACGGATTCCAATAC 
ATGCTGCACCAATGTTTCGAGGGGAACCGGCACTACCCTTCGAGGATGCCTAAGGTTATG 

SerProGlyGlnArgValGluPheLeuValGlnAlaTrpLysSerLysLysThrProMet 
4142 TCACCAGGACAGCGGGTTGAATTCCTCGTGCAAGCGTGGAAGTCCAAGAAAACCCCAATG 
AGTGGTCCTGTCGCCCAACTTAAGGAGCACGTTCGCACCTTCAGGTTCTTTTGGGGTTAC 

A 

4160 ECORI, 

GlyPheSerTyrAspThrArgCysPheAspSerThrValThrGluSerAspIleArgThr 
4202 GGGTTCTCGTATGATACCCGCTGCTTTGACTCCACAGTCACTGAGAGCGACATCCGTACG 
CCCAAGAGCATACTATGGGCGACGAAACTGAGGTGTCAGTGACTCTCGCTGTAGGCATGC 
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4229 DRD1, 4236 ALWN1, 

GluGluAlalleTyrGlnCysCysAspLeuAspProGlnAlaArgValAlalieLysSer 
4262 GAGGAGGCAATCTACCAATGTTGTGACCTCGACCCCCAAGCCCGCGTGGCCATCAAGTCC 

CTCCTCCGTTAGATGGTTACAACACTGGAGCTGGGGGTTCGGGCGCACCGGTAGTTCAGG 

■'■ 1 a a 

4301 BGLI, 4308 BALI, 

LeuThrGluArgLeuTyrValGlyGlyProLeuThrAsnSerArgGlyGluAsnCysGly 
4 322 CTCACCGAGAGGCTTTATGTTGGGGGCCCTCTTACCAATTCAAGGGGGGAGAACTGCGGC 
GAGTGGCTCTCCGAAATACAACCCCCGGGAGAATGGTTAAGTTCCCCCCTCTTGACGCCG 

A 

4 34 5 APAI, 

TyrArgArgCysArgAlaSerGlyValLeuThrThrSerCysGlyAsnThrLeuThrCys 
4 382 TATCGCAGGTGCCGCGCGAGCGGCGTACTGACAACTAGCTGTGGTAACACCCTCACTTGC 
ATAGCGTCCACGGCGCGCTCGCCGCATGACTGTTGATCGACACCATTGTGGGAGTGAACG 

TyrlleLysAlaArgAlaAlaCysArgAlaAlaGlyLeuGlnAspCysThrMetLeuVal 
4 4 42 TACATCAAGGCCCGGGCAGCCTGTCGAGCCGCAGGGCTCCAGGACTGCACCATGCTCGTG 
ATGTAGTTCCGGGCCCGTCGGACAGCTCGGCGTCCCGAGGTCCTGACGTGGTACGAGCAC 

4 4 52 SMAI XMAI, 

CysGlyAspAspLeuValVallleCysGluSerAlaGlyValGlnGluAspAlaAlaSer 
4 502 TGTGGCGACGACTTAGTCGTTATCTGTGAAAGCGCGGGGGTCCAGGAGGACGCGGCGAGC 
ACACCGCTGCTGAATCAGCAATAGACACTTTCGCGCCCCCAGGTCCTCCTGCGCCGCTCG 

A A 

4508 DRD1 , 4511 TTH3I , 

LeuArgAlaPheThrGluAlaMetThrArgTyrSerAlaProProGlyAspProProGln 
4 562 CTGAGAGCCTTCACGGAGGCTATGACCAGGTACTCCGCCCCCCCTGGGGACCCCCCACAA 
GACTCTCGGAAGTGCCTCCGATACTGGTCCATGAGGCGGGGGGGACCCCTGGGGGGTGTT 

ProGluTyrAspLeuGluLeuIleThrSerCysSerSerAsnValSerValAlaHisAsp 
4 622 CCAGAATACGACTTGGAGCTCATAACATCATGCTCCTCCAACGTGTCAGTCGCCCACGAC 
GGTCTTATGCTGAACCTCGAGTATTGTAGTACGAGGAGGTTGCACAGTCAGCGGGTGCTG 

A 

4 637 SACI, 

GlyAlaGlyLysArgValTyrTyrLeuThrArgAspProThrThrProLeuAlaArgAla 
4682 GGCGCTGGAAAG AGGGTCTACTACCTCACCCGTGACCCTAC AACCCCCCTCGCGAGAGCT 
CCGCGACCTTTCTCCCAGATGATGGAGTGGGCACTGGGATGTTGGGGGGAGCGCTCTCGA 

A 

4731 NRUI, 

AlaTrpGluThrAlaArgHisThrProValAsnSerTrpLeuGlyAsnllelleMetPhe 
4 7 4 2 GCGTGGGAGACAGCAAGACACACTCCAGTCAATTCCTGGCTAGGCAACATAATCATGTTT 
CGCACCCTCTGTCGTTCTGTGTGAGGTCAGTTAAGGACCGATCCGTTGTATTAGTACAAA 

. AlaProThrLeuTrpAlaArgMetlleLeuMetThrHisPhePheSerValLeuIleAla 
4 802 GCCCCCACACTGTGGGCGAGGATGATACTGATGACCCATTTCTTTAGCGTCCTTATAGCC 
CGGGGGTGTGACACCCGCTCCTACTATGACTACTGGGTAAAGAAATCGCAGGAATATCGG 

A A 

4806 PFLM1, 4807 DRA3, 

ArgAspGlnLeuGluGlnAlaLeuAspCysGluIleTyrGlyAlaCysTyrSerlleGlu 
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4 862 AGGGACCAGCTTGAACAGGCCCTCGATTGCGAGATCTACGGGGCCTGCTACTCCA7AGAA 
TCCCTGGTCGAACTTGTCCGGGAGCTAACGCTCTAGATGCCCCGGACGATGAGGTATCTT 

A 

4 893 BGL2, 

' " ProLeuAspLeuProProIlelleGlnArgLeuHisGlyLeuSerAlaPheSerLeuHis 
4 922 CCACTGGATCTACCTCCAATCATTCAAAGACTCCATGGCCTCAGCGCATTTTCACTCCAC 
GGTGACCTAGATGGAGGTTAGTAAGTTTCTGAGGTACCGGAGTCGCGTAAAAGTGAGGTG 

4 954 NCOI, 

SerTyrSerProGlyGluIleAsaArgValAlaAlaCysLeuArgLysLeuGlyValPro 
4 982 AGTTACTCTCCAGGTGAAATCAATAGGGTGGCCGCATGCCTCAGAAAACTTGGGGTACCG 
TCAATGAGAGGTCCACTTTAGTTATCCCACCGGCGTACGGAGTCTTTTGAACCCCATGGC 

A • A 

5015 SPHI, 5035 KPNI, 

ProLeuArgAlaTrpArgHisArgAlaArgSerValArgAlaArgLeuLeuAlaArgGly 
5042 CCCTTGCGAGCTTGGAGACACCGGGCCCGGAGCGTCCGCGCTAGGCTTCTGGCCAGAGGA 
GGGAACGCTCGAACCTCTGTGGCCCGGGCCTCGCAGGCGCGATCCGAAGACCGGTCTCCT 

A A 

5064 APAI, 5091 BALI, 

GlyArgAlaAlalleCysGlyLysTyrLeuPheAsnTrpAlaValArgThrLysLeuLys 
5102 GGCAGGGCTGCCATATGTGGCAAGTACCTCTTCAACTGGGCAGTAAGAACAAAGCTCAAA 
CCGTCCCGACGGTATACACCGTTCATGGAGAAGTTGACCCGTCATTCTTGTTTCGAGTTT 

A 

- 5113 NDEI, 

LeuThrProIleAlaAlaAlaGlyGlnLeuAspLeuSerGlyTrpPheThrAlaGlyTyr 
5162 CTCACTCCAATAGCGGCCGCTGGCCAGCTGGACTTGTCCGGCTGGTTCACGGCTGGCTAC 
GAGTGAGGTTATCGCCGGCGACCGGTCGACCTGAACAGGCCGACCAAGTGCCGACCGATG 

A A A A 

5174 NOT I, 5175 EAG1 XMA3, 5182 BALI, 5186 PVU2, 

SerGlyGlyAspIleTyrHisSerValSerHisAlaArgProArgTrpIleTrpPheCys 
5222 AGCGGGGGAGACATTTATCACAGCGTGTCTCATGCCCGGCCCCGCTGGATCTGGTTTTGC 
TCGCCCCCTCTGTAAATAGTGTCGCACAGAGTACGGGCCGGGGCGACCTAGACCAAAACG 

A 

5240 DRA3, 

LeuLeuLeuLeuAlaAlaGlyValGlylleTyrLeuLeuProAsnArgMetSerThrAsn 
. • 5282 CTACTCCTGCTTGCTGCAGGGGTAGGCATCTACCTCCTCCCCAACCGAATGAGCACGAAT 
GATGAGGACGAACGACGTCCCCATCCGTAGATGGAGGAGGGGTTGGCTTACTCGTGCTTA 

A 

5295 PSTI, 

ProLysProGlnArgLysThrLysArgAsnThrAsnArgArgProGlnAspValLysPhe 
5342 CCT AAACCTC AAAGAAAGACCAAACGTAACACCAACCGGCGGCCGCAGGACGTCAAGTTC 
GGATTTGGAGTTTCTTTCTGGTTTGCATTGTGGTTGGCCGCCGGCGTCCTGCAGTTCAAG 

/\ A A /\ 

5380 NOTI, 5381 EAG1 XMA3, 5390 AAT2, 5401 SMAI XMAI, 

ProGlyGlyGlyGlnlleValGlyGlyValTyrLeuLeuProArgArgGlyProArgLeu 
5402 CCGGGTGGCGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGGCCCTAGATTG 
GGCCCACCGCCAGTCTAGCAACCACCTCAAATGAACAACGGCGCGTCCCCGGGATCTAAC 
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54 4 9 APAI , 

GlyValArgAlaThrArgLysThrSerGluArgSerGlnProArgGlyArgArgGlnPro 
54 62 GGTGTGCGCGCGACGAGAAAGACTTCCGAGCGGTCGCAACCTCGAGGTAGACGTCAGCCT 
. CCACACGCGCGCTGCTCTTTCTGAAGGCTCGCCAGCGTTGGAGCTCCATCTGCAGTCGGA 

5467 BSSH2, 5478 XMNI, 5502 XHOI, 5511 AAT2, 

IleProLysAlaArgArgProGluGlyArgThrTrpAlaGlnProGlyTyrProTrpPro 
5522 ATCCCCAAGGCTCGTCGGCCCGAGGGCAGGACCTGGGCTCAGCCCGGGTACCCTTGGCCC 
TAGGGGTTCCGAGCAGCCGGGCTCCCGTCCTGGACCCGAGTCGGGCCCATGGGAACCGGG 

/N A A A 

5548 ALWN1, 5558 ESP1, 5564 SMAI XMAI, 5568 KPNI, 

LeuTyrGlyAsnGluGlyCysGlyTrpAlaGlyTrpLeuLeuSerProArgGlySerArg 
5582 CTCT ATGGCAATGAGGGCTGCGGGf GGGCGGGATGGCTCCTGTCTCCCCGTGGCTCTCGG 
GAGATACCGTTACTCCCGACGCCCACCCGCCCTACCGAGGACAGAGGGGCACCGAGAGCC 

ProSerTrpGlyProThrAspProArgArgArgSerArgAsnLeuGlyLysOC AM 
5642 CCT AGCTGGGGCCCCACAG ACCCCCGGCGTAGGTCGCGCAATTTGGGT AAGT AATAGTCG 
GGATCGACCCCGGGGTGTCTGGGGGCCGCATCCAGCGCGTTAAACCCATTCATTATCAGC 

5650 APAI, 5698 SALI, 
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MetAlaAlaTyrAlaAlaGlnGlyTyrLysValLeuValLeuAsn 
2 AGCTTACAAAACAAAATGGCTGCATATGCAGCTCAGGGCTATAAGGTGCTAGTACTCAAC 
• TCGAATGTTTTGTTTTACCGACGTATACGTCGAGTCCCGATATTCCACGATCATGAGTTG 

1 HIND3, 24 NDEI, 52 SCAI, 

ProSerValAlaAlaThrLeuGlyPheGlyAlaTyrMetSerLysAlaHisGlylleAsp 
62 CCCTCTGTTGCTGCAACACTGGGCTTTGGTGCTTACATGTCCAAGGCTCATGGGATCGAT 
GGGAGACAACGACGTTGTGACCCGAAACCACGAATGTACAGGTTCCGAGTACCCTAGCTA 

116 CLAI, 

ProAsnlleArgThrGlyValArgThrlleThrThrGlySerProIleThrTyrSerThr 
122 CCTAACATCAGGACCGGGGTGAGAACAATTACCACTGGCAGCCCCATCACGTACTCCACC 
GGATTGTAGTCCTGGCCCCACTCTTGTTAATGGTGACCGTCGGGGTAGTGCATGAGGTGG 

TyrGlyLysPheLeuAlaAspGlyGlyCysSerGlyGlyAlaTyrAspIlellelleCys 
182 TACGGCAAGTTCCTTGCCGACGGCGGGTGCTCGGGGGGCGCTTATGACATAATAATTTGT 
ATGCCGTTCAAGGAACGGCTGCCGCCCACGAGCCCCCCGCGAATACTGTATTATTAAACA 

AspGluCysHisSerThrAspAlaThrSerlleLeuGlylleGlyThrValLeuAspGln 
24 2 GACGAGTGCCACTCCACGGATGCCACATCCATCTTGGGCATTGGCACTGTCCTTGACCAA 
CTGCTCACGGTGAGGTGCCTACGGTGTAGGTAGAACCCGTAACCGTGACAGGAACTGGTT 

AlaGluThrAlaGlyAlaArgLeuValValLeuAlaThrAlaThrProProGlySerVal 
302 GCAGAGACTGCGGGGGCGAGACTGGTTGTGCTCGCCACCGCCACCCCTCCGGGCTCCGTC 
CGTCTCTGACGCCCCCGCTCTGACCAACACGAGCGGTGGCGGTGGGGAGGCCCGAGGCAG 

303 ALWN1, 

ThrValProHisProAsnlleGluGluValAlaLeuSerThrThrGlyGluIleProPhe 
362 ACTGTGCCCCATCCCAACATCGAGGAGGTTGCTCTGTCCACCACCGGAGAGATCCCTTTT 
TGACACGGGGTAGGGTTGTAGCTCCTCCAACGAGACAGGTGGTGGCCTCTCTAGGGAAAA 

TyrGlyLysAlalleProLeuGluVallleLysGlyGlyArgHisLeuIlePheCysHis 
422 TACGGCAAGGCTATCCCCCTCGAAGTAATCAAGGGGGGGAGACATCTCATCTTCTGTCAT 
ATGCCGTTCCGATAGGGGGAGCTTCATTAGTTCCCCCCCTCTGTAGAGTAGAAGACAGTA 

SerLysLysLysCysAspGluLeuAlaAlaLysLeuValAlaLeuGlylleAsnAlaVal 
* 4 82 TCAAAGAAGAAGTGCGACGAACTCGCCGCAAAGCTGGTCGCATTGGGCATCAATGCCGTG 
AGTTTCTTCTTCACGCTGCTTGAGCGGCGTTTCGACCAGCGTAACCCGTAGTTACGGCAC 

AlaTyrTyrArgGlyLeuAspValSerVallleProThrSerGlyAspValValValVal 
54 2 GCCTACTACCGCGGTCTTGACGTGTCCGTCATCCCGACCAGCGGCGATGTTGTCGTCGTG 
CGGATGATGGCGCCAGAACTGCACAGGCAGTAGGGCTGGTCGCCGCTACAACAGCAGCAC 

550 SAC2 , 560 DRD1, 

AlaThrAspAlaLeuMetThrGlyTyrThrGlyAspPheAspSerVallleAspCysAsn 
602 GCAACCGATGCCCTCATGACCGGCTATACCGGCGACTTCGACTCGGTGATAGACTGCAAT 
CGTTGGCTACGGGAGTACTGGCCGATATGGCCGCTGAAGCTGAGCCACTATCTGACGTTA 
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ThrCysValThrGlnThrValAspPheSerLeuAspProThrPheThrlieGluThrlie 
.662 ACGTGTGTCACCCAGACAGTCGATTTCAGCCTTGACCCTACCTTCACCATTGAGACAATC 
TGCACACAGTGGGTCTGTCAGCTAAAGTCGGAACTGGGATGGAAGTGGTAACTCTGTTAG 

... ThrLeuProGlnAspAlaValSerArgThrGlnArgArgGlyArgThrGlyArgGlyLys . 
722 ACGCTCCCCCAAGATGCTGTCTCCCGCACTCAACGTCGGGGCAGGACTGGCAGGGGGAAG 
TGCGAGGGGGTTCTACGACAGAGGGCGTGAGTTGCAGCCCCGTCCTGACCGTCCCCCTTC 

ProGlylleTyrArgPheValAlaProGlyGluArgProSerGlyMetPheAspSerSer 
7 82 CCAGGCATCTACAGATTTGTGGCACCGGGGGAGCGCCCCTCCGGCATGTTCGACTCGTCC 
GGTCCGTAGATGTCTAAACACCGTGGCCCCCTCGCGGGGAGGCCGTACAAGCTGAGCAGG 

A A 

816 BGLI, 833 DRD1, 

ValLeuCysGluCysTyrAspAlaGlyCysAlaTrpTyrGiuLeuThrProAlaGluThr 
84 2 GTCCTCTGTGAGTGCTATGACGCAGGCTGTGCTTGGTATGAGCTCACGCCCGCCGAGACT 
CAGGAGACACTCACGATACTGCGTCCGACACGAACCATACTCGAGTGCGGGCGGCTCTGA 

A 

881 SACI, 

ThrValArgLeuArgAlaTyrMetAsnThrProGlyLeuProValCysGlnAspHisLeu 
902 ACAGTTAGGCTACGAGCGTACATGAACACCCCGGGGCTTCCCGTGTGCCAGGACCATCTT 
TGTCAATCCGATGCTCGCATGTACTTGTGGGGCCCCGAAGGGCACACGGTCCTGGTAGAA 

931 SMAI XMAI, 

GluPheTrpGluGlyValPheThrGlyLeuThrHisIleAspAlaHisPheLeuSerGln 
962 GAATTTTGGGAGGGCGTCTTTACAGGCCTCACTCATATAGATGCCCACTTTCTATCCCAG 
CTTAAAACCCTCCCGCAGAAATGTCCGGAGTGAGTATATCTACGGGTGAAAGATAGGGTC 

A 

985 STUI, 

ThrLysGlnSerGlyGluAsnLeuProTyrLeuValAlaTyrGlnAlaThrValCysAla 
1022 ACAAAGCAGAGTGGGGAGAACCTTCCTTACCTGGTAGCGTACCAAGCCACCGTGTGCGCT 
TGTTTCGTCTCACCCCTCTTGGAAGGAATGGACCATCGCATGGTTCGGTGGCACACGCGA 

A 

1069 DRA3, 

ArgAlaGlnAlaProProProSerTrpAspGlnMetTrpLysCysLeuIleArgLeuLys 
1082 AGGGCTCAAGCCCCTCCCCCATCGTGGGACCAGATGTGGAAGTGTTTGATTCGCCTCAAG 
TCCCGAGTTCGGGGAGGGGGTAGCACCCTGGTCTACACCTTCACAAACTAAGCGGAGTTC 

ProThrLeuHisGlyProThrProLeuLeuTyrArgLeuGlyAlaValGlnAsnGluIle 
114 2 CCCACCCTCCATGGGCCAACACCCCTGCTATACAGACTGGGCGCTGTTCAGAATGAAATC 
GGGTGGGAGGTACCCGGTTGTGGGGACGATATGTCTGACCCGCGACAAGTCTTACTTTAG 

A 

1150 NCOI, 

ThrLeuThrHisProValThrLysTyrlleMetThrCysMetSerAlaAspLeuGluVal 
1202 ACCCTGACGCACCCAGTCACCAAATACATCATGACATGCATGTCGGCCGACCTGGAGGTC 
TGGGACTGCGTGGGTCAGTGGTTTATGTAGTACTGTACGTACAGCCGGCTGGACCTCCAG 

AAA A A 

1230 BSPH1, 1234 DRD1, 1237 AVA3 , 1245 EAG1 XMA3, 1250 DRD1, 



ValThrSerThrTrpValLeuValGlyGlyValLeuAlaAlaLeuAlaAlaTyrCysLeu 
1262 GTCACGAGCACCTGGGTGCTCGTTGGCGGCGTCCTGGCTGCTTTGGCCGCGTATTGCCTG 
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CAGTGCTCGTGGACCCACGAGCAACCGCCGCAGGACCGACGAAACCGGCGCA7AACGGAC 

SerThrGlyCysValValllfeValGlyArgValValLeuSerGlyLysProAlallelle 
1322 TCAACAGGCTGCGTGGTCATAGTGGGCAGGGTCGTCTTGTCCGGGAAGCCGGCAATCATA 
AGTTGTCCGACGCACCAGTATCACCCGTCCCAGCAGAACAGGCCCTTCGGCCGTTAGTAT 

At 

1369 NAEI, 

ProAspArgGluValLeuTyrArgGluPheAspGluMetGluGluCysSerGlnHisLeu 
1382 CCTGACAGGGAAGTCCTCTACCGAGAGTTCGATGAGATGGAAGAGTGCTCTCAGCACTTA 
GGACTGTCCCTTCAGGAGATGGCTCTCAAGCTACTCTACCTTCTCACGAGAGTCGTGAA7 

1385 DRD1, 

ProTyrlleGluGlnGlyMetMetLeuAlaGluGlnPheLysGlnLysAlaLeuGlyLeu 
14 42 CCGTACATCGAGCAAGGGATGATGCTCGCCGAGCAGTTCAAGCAGAAGGCCCTCGGCCTC 
GGCATGTAGCTCGTTCCCTACTACGAGCGGCTCGTCAAGTTCGTCTTCCGGGAGCCGGAG 

LeuGlnThrAlaSerArgGlnAlaGluVallleAlaProAlaValGlnThrAsnTrpGln 
1502 CTGCAGACCGCGTCCCGTCAGGCAGAGGTTATCGCCCCTGCTGTCCAGACCAACTGGCAA 
GACGTCTGGCGCAGGGCAGTCCGTCTCCAATAGCGGGGACGACAGGTCTGGTTGACCGTT 

1502 PSTI, 1507 TTH3I, 

LysLeuGluThrPheTrpAlaLysHisMetTrpAsnPhelleSerGlylleGlnTyrLeu 
1562 AAACTCGAGACCTTCTGGGCGAAGCATATGTGGAACTTCATCAGTGGGATACAATACTTG 
TTTGAGCTCTGGAAGACCCGCTTCGTATACACCTTGAAGTAGTCACCCTATGTTATGAAC 

1565 XHOI, 1586 NDEI, 

AlaGlyLeuSerThrLeuProGlyAsnProAlalleAlaSerLeuMetAlaPheThrAla 
1622 GCGGGCTTGTCAACGCTGCCTGGTAACCCCGCCATTGCTTCATTGATGGCTTTTACAGCT 
CGCCCGAACAGTTGCGACGGACCATTGGGGCGGTAACGAAGTAACTACCGAAAATGTCGA 

164 3 BSTE2, 1677 ALWN1 PVU2, 

AlaValThrSerProLeuThrThrSerGlnThrLeuLeuPheAsnlleLeuGlyGlyTrp 
1682 GCTGTCACCAGCCCACTAACCACTAGCCAAACCCTCCTCTTCAACATATTGGGGGGGTGG 
CGACAGTGGTCGGGTGATTGGTGATCGGTTTGGGAGGAGAAGTTGTATAACCCCCCCACC 

ValAlaAlaGlnLeuAlaAlaProGlyAlaAlaThrAlaPheValGlyAlaGlyLeuAla 
174 2 GTGGCTGCCCAGCTCGCCGCCCCCGGTGCCGCTACTGCCTTTGTGGGCGCTGGCTTAGCT 
CACCGACGGGTCGAGCGGCGGGGGCCACGGCGATGACGGAAACACCCGCGACCGAATCGA 

1794 ESP1, 

GlyAlaAlalleGlySerValGlyLeuGlyLysValLeuIleAspIleLeuAlaGlyTyr 
1302 GGCGCCGCCATCGGCAGTGTTGGACTGGGGAAGGTCCTCATAGACATCCTTGCAGGGT AT 
CCGCGGCGGTAGCCGTCACAACCTGACCCCTTCCAGGAGTATCTGTAGGAACGTCCCATA 

1802 KAS1 NARI, 

GlyAlaGlyValAlaGlyAlaLeuValAlaPheLysIleMetSerGlyGluValProSer 
1862 GGCGCGGGCGTGGCGGGAGCTCTTGTGGCATTCAAGATCATGAGCGGTGAGGTCCCCTCC 
CCGCGCCCGCACCGCCCTCGAGAACACCGTAAGTTCTAGTACTCGCCACTCCAGGGGAGG 

1878 SACI, 1899 BSPH1, 
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ThrGluAsDLeuValAsnLeuLeuProAlalleLeuSerProGlyAlaLeuValVaiGIy 
1922 ACGGAGGACCTGGTCAATCTACTGCCCGCCATCCTCTCGCCCGGAGCCCTCGTAGTCGGC 
TGCCTCCTGGACCAGTTAGATGACGGGCGGTAGGAGAGCGGGCCTCGGGAGCATCAGCCG 

a 

1928TTH3I, 

ValValCysAlaAlalleLeuArgArgHisValGlyProGlyGluGlyAlaValGlnTrp 
1982 GTGGTCTGTGCAGCAATACTGCGCCGGCACGTTGGCCCGGGCGAGGGGGCAGTGCAGTGG 
CACCAGACACGTCGTTATGACGCGGCCGTGCAACCGGGCCCGCTCCCCCGTCACGTCACC 

A ^ 

2004 NAEI, 2017 SMAI XMAI, 

MetAsnArgLeuIleAlaPheAlaSerArgGlyAsnHisValSerProThrHisTyrVal 
204 2 ATGAACCGGCTGAT AGCCTTCGCCTCCCGGGGGAACCATGTTTCCCCCACGCACTACGTG 
TACTTGGCCGACTATCGGAAGCGGAGGGCCCCCTTGGTACAAAGGGGGTGCGTGATGCAC 

A 

2067 SMAI XMAI, 2093 DRA3, 

ProGluSerAspAlaAlaAlaArgValThrAlalleLeuSerSerLeuThrValThrGln 
2102 CCGGAGAGCGATGCAGCTGCCCGCGTCACTGCCATACTCAGCAGCCTCACTGTAACCCAG 
GGCCTCTCGCTACGTCGACGGGCGCAGTGACGGTATGAGTCGTCGGAGTGACATfGGGTC 

A * 

2115 PVU2, 2159 ALWN1 , 

LeuLeuArgArgLeuHisGlnTrpIleSerSerGluCysThrThrProCysSerGlySer 
2162 CTCCTGAGGCGACTGC ACCAGTGGAT AAGCTCGGAGTGTACCACTCCATGCTCCGGTTCC 
GAGGACTCCGCTGACGTGGTCACCTATTCGAGCCTCACATGGTGAGGTACGAGGCCAAGG 

A 

2164 MST2, 2220 ECON1, 

TrpLeuArgAspIleTrpAspTrpIleCysGluValLeuSerAspPheLysThrTrpLeu 
2222 TGGCTAAGGGACATCTGGGACTGGATATGCGAGGTGTTGAGCGACTTT AAGACCTGGCTA 
ACCGATTCCCTGTAGACCCTGACCTATACGCTCCACAACTCGCTGAAATTCTGGACCGAT 

LysAlaLysLeuMetProGlnLeuProGlylleProPheValSerCysGlnArgGlyTyr 
2282 AAAGCTAAGCTCATGCCACAGCTGCCTGGGATCCCCTTTGTGTCCTGCCAGCGCGGGTAT 
TTTCGATTCGAGTACGGTGTCGACGGACCCTAGGGGAAACACAGGACGGTCGCGCCCATA 

A A A 

2285 ESP1, 2300 PVU2, 2310 BAMHI, 

LysGlyValTrpArgGlyAspGlylleMetHisThrArgCysHisCysGlyAlaGluIle 
. " 2342 AAGGGGGTCTGGCGAGGGGACGGCATCATGCACACTCGCTGCCACTGTGGAGCTGAGATC 
TTCCCCCAGACCGCTCCCCTGCCGTAGTACGTGTGAGCGACGGTGACACCTCGACTCTAG 

ThrGlyHisValLysAsnGlyThrMetArglleValGlyProArgThrCysArgAsnMet 
24 02 ACTGGACATGTCAAAAACGGGACGATGAGGATCGTCGGTCCTAGGACCTGCAGGAACATG 
TGACCTGTACAGTTTTTGCCCTGCTACTCCTAGCAGCCAGGATCCTGGACGTCCTTGTAC 

A A A A 

2425 BSAB1, 2441 AVR2, 2448 SSE83871, 2449 PSTI, 

TrpSerGlyThrPheProIleAsnAlaTyrThrThrGlyProCysThrProLeuProAla 
24 62 TGGAGTGGGACCTTCCCCATTAATGCCTACACCACGGGCCCCTGTACCCCCCTTCCTGCG 
ACCTCACCCTGGAAGGGGTAATTACGGATGTGGTGCCCGGGGACATGGGGGGAAGGACGC 

A A 

2480 ASE1, 2497 APAI, 
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ProAsnTyrThrPh AlaLeuTrpArgValSerAlaGluGluTyrValGluIIeArgGir. 
2522 CCGAACTACACGTTCGCGCTATGGAGGGTGTCTGCAGAGGAATACGTGGAGA7AAGGCAG 
GGCTTGATGTGCAAGCGCGATACCTCCCACAGACGTCTCCTTATGCACCTCTAT7CCGTC 

'2553 PSTI, 

ValGlyAspPheHisTyrValThrGlyMetThrThrAspAsnLeuLysCysProCysGln 
25 8 2 GTGGGGGACTTCCACTACGTGACGGGTATGACTACTGACAATCTTAAATGCCCGTGCCAG 
CACCCCCTGAAGGTGATGCACTGCCCATACTGATGACTGTTAGAATTTACGGGCACGGTC 

A 

2594 DRA3, 

ValProSerProGluPhePheThrGluLeuAspGlyValArgLeuHisArgPheAlaPro 
264 2 GTCCCATCGCCCGAATTTTTCACAGAATTGGACGGGGTGCGCCTACATAGGTTTGCGCCC 
CAGGGTAGCGGGCTTAAAAAGTGTCTTAACCTGCCCCACGCGGATGTATCCAAACGCGGG 

ProCysLysProLeuLeuArgGluGluValSerPheArgValGlyLeuHisGluTyrPro 
2702 CCCTGCAAGCCCTTGCTGCGGGAGGAGGTATCATTCAGAGTAGGACTCCACGAATACCCG 

GGGACGTTCGGGAACGACGCCCTCCTCCATAGTAAGTCTCATCCTGAGGTGCTTATGGGC 

/\ 

2757 HGIE2, 

ValGlySerGlnLeuProCysGluProGluProAspValAlaValLeuThrSerMetLeu 
27 62 GT AGGGTCGCAATTACCTTGCGAGCCCGAACCGGACGTGGCCGTGTTGACGTCC ATGCTC 
CATCCCAGCGTTAATGGAACGCTCGGGCTTGGCCTGCACCGGCACAACTGCAGGTACGAG 

/\ 

2809 AAT2 , 

ThrAspProSerHisIleThrAlaGluAlaAlaGlyArgArgLeuAlaArgGlySerPro 
2822 ACTGATCCCTCCCATATAACAGCAGAGGCGGCCGGGCGAAGGTTGGCGAGGGGATCACCC 
TGACTAGGGAGGGTATATTGTCGTCTCCGCCGGCCCGCTTCCAACCGCTCCCCTAGTGGG 

2850 EAG1 XMA3, 

ProSerValAlaSerSerSerAlaSerGlnLeuSerAlaProSerLeuLysAlaThrCys 
2882 CCCTCTGTGGCCAGCTCCTCGGCTAGCCAGCTATCCGCTCC ATCTCTCAAGGCAACTTGC 
GGGAGACACCGGTCGAGGAGCCGATCGGTCGATAGGCGAGGTAGAGAGTTCCGTTGAACG 

2889 BALI, 2903 NHEI, 

ThrAlaAsnHisAspSerProAspAlaGluLeuIIeGluAlaAsnLeuLeuTrpArgGln 
294 2 ACCGCTAACCATGACTCCCCTGATGCTGAGCTCATAGAGGCCAACCTCCTATGGAGGCAG 
TGGCGATTGGTACTGAGGGGACTACGACTCGAGTATCTCCGGTTGGAGGATACCTCCGTC 

* A 

2966 ESP1, 2969 SACI, 

GluMetGlyGlyAsnlleThrArgValGluSerGluAsnLysValVallleLeuAspSer 
3002 GAGATGGGCGGCAACATCACCAGGGTTGAGTCAGAAAACAAAGTGGTGATTCTGGACTCC 
CTCTACCCGCCGTTGTAGTGGTCCCAACTCAGTCTTTTGTTTCACCACTAAGACCTGAGG 

PheAspProLeuValAlaGluGluAspGluArgGluIleSerValProAlaGluIleLeu 
3062 TTCGATCCGCTTGTGGCGGAGGAGGACGAGCGGGAGATCTCCGTACCCGCAGAAATCCTG 
AAGCTAGGCGAACACCGCCTCCTCCTGCTCGCCCTCTAGAGGCATGGGCGTCTTTAGGAC 

3096 BGL2 , 

ArgLysSerArgArgPh AlaGlnAlaLeuProValTrpAlaArgProAspTyrAsnPro 
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3122 CGGAAGTCTCGGAGATTCGCCCAGGCCCTGCCCGTTTGGGCGCGGCCGGACfATAACCCC 
GCCTTCAGAGCCTCTAAGCGGGTCCGGGACGGGCAAACCCGCGCCGGCCTGATATTGGGG 

a a 

3143 ALWN1, 3164 EAG1 XMA3, 

ProLeuValGluThrTrpLysLysProAspTyrGluProProValValHisGlyCysPro 
3182 CCGCTAGTGGAGACGTGGAAAAAGCCCGACTACGAACCACCTGTGGTCCATGGCTGCCCG 
GGCGATCACCTCTGCACCTTTTTCGGGCTGATGCTTGGTGGACACCAGGTACCGACGGGC 

a * 

3217 HGIE2 , 3229 NCOI, 

LeuProProProLysSerProProValProProProArgLysLysArgThrValValLeu 
324 2 CTTCCACCTCCAAAGTCCCCTCCTGTGCCTCCGCCTCGGAAGAAGCGGACGGTGGTCCTC 
GAAGGTGGAGGTTTCAGGGGAGGACACGGAGGCGGAGCCTTCTTCGCCTGCCACCAGGAG 

ThrGluSerThrLeuSerThrAlaLeuAlaGluLeuAlaThrArgSerPheGlySerSer 
3302 ACTGAATCAACCCTATCTACTGCCTTGGCCGAGCTCGCCACCAGAAGCTTTGGCAGCTCC 
TGACTTAGTTGGGATAGATGACGGAACCGGCTCGAGCGGTGGTCTTCGAAACCGTCGAGG 

A ~ 

3332 SAC I, 3346 HIND3, 

SerThrSerGlylleThrGlyAspAsnThrThrThrSerSerGluProAlaProSerGly 
3362 TCAACTTCCGGCATTACGGGCGACAATACGACAACATCCTCTGAGCCCGCCCCTTCTGGC 
AGTTGAAGGCCGTAATGCCCGCTGTTATGCTGTTGTAGGAGACTCGGGCGGGGAAGACCG 

CysProProAspSerAspAlaGluSerTyrSerSerMetProProLeuGluGlyGluPro 
34 22 TGCCCCCCCGACTCCGACGCTGAGTCCT ATTCCTCCATGCCCCCCCTGGAGGGGGAGCCT 
ACGGGGGGGCTGAGGCTGCGACTCAGGATAAGGAGGTACGGGGGGGACCTCCCCCTCGGA 

3437 EAM11051, 

GlyAspProAspLeuSerAspGlySerTrpSerThrValSerSerGluAlaAsnAlaGlu 
34 82 GGGGATCCGGATCTTAGCGACGGGTCATGGTCAACGGTCAGTAGTGAGGCCAACGCGGAG 
CCCCTAGGCCTAGAATCGCTGCCCAGTACCAGTTGCCAGTCATCACTCCGGTTGCGCCTC 

A A A 

3484 BAMHI , 3485 BSABl, 3487 BSPE1, 

AsDValValCysCysSerMetSerTyrSerTrpThrGlyAlaLeuValThrProCysAla 
354 2 G ATGTCGTGTGCTGCTCAATGTCTT ACTCTTGGACAGGCGCACTCGTCACCCCGTGCGCC 
CTACAGCACACGACGAGTTACAGAATGAGAACCTGTCCGCGTGAGCAGTGGGGCACGCGG 

A A 

3589 DRA3, 3600 SAC2, 

AlaGluGluGlnLysLeuProIleAsnAlaLeuSerAsnSerLeuLeuArgHisHisAsn 
3602 GCGGAAGAACAGAAACTGCCCATCAATGCACTAAGCAACTCGTTGCTACGTCACCACAAT 
CGCCTTCTTGTCTTTGACGGGTAGTTACGTGATTCGTTGAGCAACGATGCAGTGGTGTTA 

A 

3611 ALWN1, 3655 PFLM1, 

LeuValTyrSerThrThrSerArgSerAlaCysGlnArgGlnLysLysValThrPheAsp 
3662 TTGGTGTATTCCACCACCTCACGCAGTGCTTGCCAAAGGCAGAAGAAAGTCACATTTGAC 
AACCACATAAGGTGGTGGAGTGCGTCACGAACGGTTTCCGTCTTCTTTCAGTGTAAACTG 

A ' 

3681 DRA3, 

ArgLeuGlnValLeuAspSerHisTyrGlnAspValLeuLysGluValLysAlaAlaAla 
3722 AGACTGCAAGTTCTGGACAGCCATTACCAGGACGTACTCAAGGAGGTTAAAGCAGCGGCG 
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TCTGACGTTCAAGACCTGTCGGTAATGGTCCTGCATGAG7TCCTCCAAT7TCGTCGCCGC 

SerLysValLysAlaAsnLeuLeuSerValGluGluAlaCysSerLeuThrProProHis 
3782 TCAAAAGTGAAGGCTAACTTGCTATCCGTAGAGGAAGCTTGCAGCCTGACGCCCCCACAC 
.... AGTTTTCACTTCCGATTGAACGATAGGCATCTCCTTCGAACGTCGGACTGCGGGGGTGTG 

3816 HIND3, 

SerAlaLysSerLysPheGly7yrGlyAlaLysAspValArgCysHisAlaArgLysAla 
384 2 TCAGCCAAATCCAAGTTTGGTTATGGGGCAAAAGACGTCCGTTGCCATGCCAGAAAGGCC 
AGTCGGTTTAGGTTCAAACCAATACCCCGTTTTCTGCAGGCAACGGTACGGTCTTTCCGG 

A A 

3875 AAT2 , 3890 BGLI, 

VaiThrHisIleAsnSerValTrpLysAspLeuLeuGluAspAsnValThrProIleAsp 
3902 GTAACCCACATCAACTCCGTGTGGAAAGACCTTCTGGAAGACAATGTAACACCAATAGAC 
CATTGGGTGTAGTTGAGGCACACCTTTCTGGAAGACCTTCTGTTACATTGTGGTTATCTG 

ThrThrlleMetAlaLysAsnGluValPheCysValGlnProGluLysGlyGlyArgLys 
3962 ACTACCATCATGGCTAAGAACGAGGTTTTCTGCGTTCAGCCTGAGAAGGGGGGTCGTAAG 
TGATGGTAGTACCGATTCTTGCTCCAAAAGACGCAAGTCGGACTCTTCCCCCCAGCATTC 

ProAlaArgLeuIleValPheProAspLeuGlyValArgValCysGluLysMetAlaLeu 
4022 CCAGCTCGTCTCATCGTGTTCCCCGATCTGGGCGTGCGCGTGTGCGAAAAGATGGCTTTG 
GGTCGAGCAGAGTAGCACAAGGGGCTAGACCCGCACGCGCACACGCTTTTCTACCGAAAC 

TyrAspValValThrLysLeuProLeuAlaValMetGlySerSerTyrGlyPheGlnTyr 
4082 TACGACGTGGTT ACAAAGCTCCCCTTGGCCGTGATGGGAAGCTCCTACGGATTCCAATAC 
ATGCTGCACCAATGTTTCGAGGGGAACCGGCACTACCCTTCGAGGATGCCTAAGGTTATG 

SerProGlyGlnArgValGluPheLeuValGlnAlaTrpLysSerLysLysThrProMet 
414 2 TCACCAGGACAGCGGGTTGAATTCCTCGTGCAAGCGTGGAAGTCCAAGAAAACCCCAATG 
AGTGGTCCTGTCGCCCAACTTAAGGAGCACGTTCGCACCTTCAGGTTCTTTTGGGGTTAC 

a 

4160 ECORI, 

GlyPheSerTyrAspThrArgCysPheAspSerThrValThrGluSerAspIleArgThr 
4 202 GGGTTCTCGTATGATACCCGCTGCTTTGACTCCACAGTCACTGAGAGCGACATCCGTACG 
CCCAAGAGCATACTATGGGCGACGAAACTGAGGTGTCAGTGACTCTCGCTGTAGGCATGC 

- . A A 

4229 DRD1, 4236 ALWN1, 

GluGluAlalleTyrGlnCysCysAspLeuAspProGlnAlaArgValAlalleLysSer 
4 262 GAGGAGGCAATCTACCAATGTTGTGACCTCGACCCCCAAGCCCGCGTGGCCATCAAGTCC 
CTCCTCCGTTAGATGGTTACAACACTGGAGCTGGGGGTTCGGGCGCACCGGTAGTTCAGG 

4 301 BGLI, 4 308 BALI , 

LeuThrGluArgLeuTyrValGlyGlyProLeuThrAsnSerArgGlyGluAsnCysGly 
4 322 CTCACCGAGAGGCTTTATGTTGGGGGCCCTCTTACCAATTCAAGGGGGGAGAACTGCGGC 
GAGTGGCTCTCCGAAATACAACCCCCGGGAGAATGGTTAAGTTCCCCCCTCTTGACGCCG 

A 

4345 APAI, 

TyrArgArgCysArgAlaSerGlyValLeuThrThrSerCysGlyAsnThrLeuThrCys 
4 382 TATCGCAGGTGCCGCGCGAGCGGCGTACTGACAACTAGCTGTGGTAACACCCTCACTTGC 
ATAGCGTCCACGGCGCGCTCGCCGCATGACTGTTGATCGACACCATTGTGGGAGTGAACG 
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TyrlleLysAlaArgAlaAlaCysArgAlaAlaGiyLeuGlnAspCysThrMetLeuVal 

44 4 2 TACATCAAGGCCCGGGCAGCCTGTCGAGCCGCAGGGCTCCAGGACTGCACCATGCTCGTG 

ATGTAGTTCCGGGCCCGTCGGACAGCTCGGCGTCCCGAGGTCCTGACGTGGTACGAGCAC 
a 

44 52 SMAI XMAI, 

CysGlyAspAspLeuValVallleCysGluSerAlaGlyValGlnGluAspAlaAiaSer 
4 502 TGTGGCGACGACTTAGTCGTTATCTGTGAAAGCGCGGGGGTCCAGGAGGACGCGGCGAGC 
ACACCGCTGCTGAATCAGCAATAGACACTTTCGCGCCCCCAGGTCCTCCTGCGCCGCTCG 

4508 DRD1, 4511 TTH3I, 

LeuArgAlaPheThrGluAlaMetThrArgTyrSerAlaProProGlyAspProProGln 
4 562 CTGAGAGCCTTCACGGAGGCTATGACCAGGTACTCCGCCCCCCCTGGGGACCCCCCACAA 
GACTCTCGGAAGTGCCTCCGATACTGGTCCATGAGGCGGGGGGGACCCCTGGGGGGTGTT 

ProGluTyrAspLeuGluLeuIleThrSerCysSerSerAsnValSerValAlaHisAsp 
4 622 CCAGAATACGACTTGGAGCTCATAACATCATGCTCCTCCAACGTGTCAGTCGCCCACGAC 
GGTCTTATGCTGAACCTCGAGTATTGTAGTACGAGGAGGTTGCACAGTCAGCGGGTGCTG 

4 637 SACI, 

GlyAlaGlyLysArgValTyrTyrLeuThrArgAspProThrThrProLeuAlaArgAla 
4 682 GGCGCTGGAAAGAGGGTCTACTACCTCACCCGTGACCCTACAACCCCCCTCGCGAGAGCT 
CCGCGACCTTTCTCCCAGATGATGGAGTGGGCACTGGGATGTTGGGGGGAGCGCTCTCGA 

4731 NRUI, 

AlaTrpGluThrAlaArgHisThrProValAsnSerTrpLeuGlyAsnllelleMetPhe 
4 742 GCGTGGGAGACAGCAAGACACACTCCAGTCAATTCCTGGCTAGGCAACATAATCATGTTT 
CGCACCCTCTGTCGTTCTGTGTGAGGTCAGTTAAGGACCGATCCGTTGTATTAGTACAAA 

AlaProThrLeuTrpAlaArgMetlleLeuMetThrHisPhePheSerValLeuIleAla 
4 802 GCCCCCACACTGTGGGCGAGGATGATACTGATGACCCATTTCTTTAGCGTCCTTATAGCC 
CGGGGGTGTGACACCCGCTCCTACTATGACTACTGGGTAAAGAAATCGCAGGAATATCGG 

/N A 

4806 PFLM1, 4807 DRA3, 

ArgAspGlnLeuGluGlnAlaLeuAspCysGluIleTyrGlyAlaCysTyrSerlleGlu 
4 8 62 AGGGACCAGCTTGAACAGGCCCTCGATTGCGAGATCTACGGGGCCTGCTACTCCATAGAA 
TCCCTGGTCGAACTTGTCCGGGAGCTAACGCTCTAGATGCCCCGGACGATGAGGTATCTT 

A 

4 8 93 BGL2 , 

ProLeuAspLeuProProIlelleGlnArgLeuHisGlyLeuSerAlaPheSerLeuHis 
4 922 CCACTGGATCTACCTCCAATCATTCAAAGACTCCATGGCCTCAGCGCATTTTCACTCCAC 
GGTGACCTAGATGGAGGTTAGTAAGTTTCTGAGGTACCGGAGTCGCGTAAAAGTGAGGTG 

A 

4954 NCOI, 

SerTyrSerProGlyGluIleAsnArgValAlaAlaCysLeuArgLysLeuGlyValPro 
4 982 AGTTACTCTCCAGGTGAAATCAATAGGGTGGCCGCATGCCTCAGAAAACTTGGGGTACCG 
TCAATGAGAGGTCCACTTTAGTTATCCCACCGGCGTACGGAGTCTTTTGAACCCCATGGC 

A A 

5015 SPHI, 5035 KPNI, 
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ProLeuArgAlaTrpArgHisArgAlaArgSerValArgAlaArgLeuLeuAlaArgGiy 
5042 CCC7TGCGAGCTTGGAGACACCGGGCCCGGAGCGTCCGCGCTAGGCTTCTGGCCAGAGGA 
GGGAACGCTCGAACCTCTGTGGCCCGGGCCTCGCAGGCGCGATCCGAAGACCGGTCTCCT 

A A 

... 5064 APAI, 5091 BALI, 

GlyArgAlaAlalleCysGlyLysTyrLeuPheAsnTrpAlaValArgThrLysLeuLys 
'5102 GGCAGGGCTGCCATATGTGGCAAGTACCTCTTCAACTGGGC AGTAAGAACAAAGCTCAAA 
CCGTCCCGACGGTATACACCGTTCATGGAGAAGTTGACCCGTCATTCTTGTTTCGAGTTT 

A 

5113 NDEI, 

LeuThrProIleAlaAlaAlaGlyGlnLeuAspLeuSerGlyTrpPheThrAlaGlyTyr 
5162 CTCACTCCAATAGCGGCCGCTGGCCAGCTGGACTTGTCCGGCTGGTTCACGGCTGGCTAC 
GAGTGAGGTTATCGCCGGCGACCGGTCGACCTGAACAGGCCGACCAAGTGCCGACCGATG 

A A A A 

5174 NOT I, 5175 EAG1 XMA3, 5182 BALI, 5186 PVU2, 

SerGlyGlyAspIleTyrHisSerValSerHisAlaArgProArgTrpIleTrpPheCys 
5222 AGCGGGGG AGACATTTATCACAGCGTGTCTCATGCCCGGCCCCGCTGGATCTGGTTTTGC 
TCGCCCCCTCTGTAAATAGTGTCGCACAGAGTACGGGCCGGGGCGACCTAGACCAAAACG 

A 

5240 DRA3, 

LeuLeuLeuLeuAlaAlaGlyValGlylleTyrLeuLeuProAsnArgMetSerThrAsn 
5282 CTACTCCTGCTTGCTGCAGGGGTAGGCATCTACCTCCTCCCCAACCGAATGAGCACGAAT 
GATGAGGACGAACGACGTCCCCATCCGTAGATGGAGGAGGGGTTGGCTTACTCGTGCTTA 

A 

5295 PSTI, 

ProLysProGlnArgLysThrLysArgAsnThrAsnArgArgProGlnAspValLysPhe 
534 2 CCTAAACCTCAAAG AAAGACCAAACGTAACACCAACCGGCGGCCGCAGGACGTCAAGTTC 
GGATTTGGAGTTTCTTTCTGGTTTGCATTGTGGTTGGCCGCCGGCGTCCTGCAGTTCAAG 

A A A A 

5380 NOTI, 5381 EAG1 XMA3, 5390 AAT2 , 5401 SMAI XMAI, 

ProGlyGlyGlyGlnlleValGlyGlyValTyrLeuLeuProArgArgGlyProArgLeu 
54 02 CCGGGTGGCGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGGCCCT AGATTG 
GGCCCACCGCCAGTCTAGCAACCACCTCAAATGAACAACGGCGCGTCCCCGGGATCTAAC 

5449 APAI, 

GlyValArgAlaThrArgLysThrSerGluArgSerGlnProArgGlyArgArgGlnPro 
54 62 GGTGTGCGCGCGACGAGAAAGACTTCCGAGCGGTCGCAACCTCGAGGTAGACGTCAGCCT 
CCACACGCGCGCTGCTCTTTCTGAAGGCTCGCCAGCGTTGGAGCTCCATCTGCAGTCGGA 

A A A A 

5467 BSSH2, 5478 XMNI, 5502 XHOI, 5511 AAT2, 

IleProLysAlaArgArgProGluGlyArgThrTrpAlaGlnProGlyTyrProTrpPro 
5522 ATCCCCAAGGCTCGTCGGCCCGAGGGCAGGACCTGGGCTCAGCCCGGGTACCCTTGGCCC 
TAGGGGTTCCGAGCAGCCGGGCTCCCGTCCTGGACCCGAGTCGGGCCCATGGGAACCGGG 

A AAA 

5548 ALWN1, 5558 ESP1, 5564 SMAI XMAI , 5568 KPNI, 

LeuTyrGlyAsnGluGlyCysGlyTrpAlaGlyTrpLeuLeuSerProArgGlySerArg 
5582 CTCTATGGCAATGAGGGCTGCGGGTGGGCGGGATGGCTCCTGTCTCCCCGTGGCTCTCGG 
GAGATACCGTTACTCCCGACGCCCACCCGCCCTACCGAGGACAGAGGGGCACCGAGAGCC 
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ProSerTrDGlyProThrAspFroArgArgArgSerArgAsnLeuGlyLysVallieAsp 
5642 CCTAGCTGGGGCCCCACAGACCCCCGGCGTAGGTCGCGCAATTTGGGTAAGGTCATCGAT 
GGATCGACCCCGGGGTGTCTGGGGGCCGCATCCAGCGCGTTAAACCCATTCCAGTAGCTA 

56S0 APAI, 5696 CLAI, 

ThrLeuThrCysGlyPheAlaAspLeuMetGlyTyrlleProLeuValGlyAlaProLeu 
5702 ACCCTTACGTGCGGCTTCGCCGACCTCATGGGGTACATACCGCTCGTCGGCGCCCCTCTT 
TGGGAATGCACGCCGAAGCGGCTGGAGTACCCCATGTATGGCGAGCAGCCGCGGGGAGAA 

5724 HGIE2, 5750 KAS1 NARI, 5756 ECON1, 

GlyGlyAlaAlaArgAlaLeuAlaHisGlyValArgValLeuGluAspGlyValAsnTyr 
57 62 GGAGGCGCTGCCAGGGCCCTGGCGCATGGCGTCCGGGTTCTGGAAGACGGCGTGAACT AT 
CCTCCGCGACGGTCCCGGGACCGCGTACCGCAGGCCCAAGACCTTCTGCCGCACTTGATA 

5772 BSTXI, 5775 APAI, 

AlaThrGlyAsnLeuProGlyCysSerOC AM 
5822 GCAACAGGGAACCTTCCTGGTTGCTCTTAATAGTCGAC 
CGTTGTCCCTTGGAAGGACCAACGAGAATTATCAGCTG 

5854 SALI, 
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MetAlaAlaTyrAlaAlaGlnGlyTyrLysValLeuValLeuAsn 
2 AGCTTACAAAACAAAATGGCTGCATATGCAGCTCAGGGCTATAAGGTGCTAGTACTCAAC 
TCGAATGTTTTGTTTTACCGACGTATACGTCGAGTCCCGATATTCCACGATCATGAGTTG 

1 HIND3, 24 NDEI , 52 SCAI, 

ProSerValAlaAlaThrLeuGlyPheGlyAlaTyrMecSerLysAlaHisGlylleAsD 
62 CCCTCTGTTGCTGCAACACTGGGCTTTGGTGCTTACATGTCCAAGGCTCATGGGATCGAT 
GGGAGACAACGACGTTGTGACCCGAAACCACGAATGTACAGGTTCCGAGTACCCTAGCTA 

116 CLAI, 

ProAsnlleArgThrGlyVaiArgThrlleThrThrGlySerProIleThrTyrSerThr 
122 CCTAACATCAGGACCGGGGTGAGAACAATTACCACTGGCAGCCCCATCACG7ACTCCACC 
GGATTGTAGTCCTGGCCCCACTCTTGTTAATGGTGACCGTCGGGGTAGTGCATGAGGTGG 

TyrGlyLysPheLeuAlaAspGlyGlyCysSerGlyGlyAlaTyrAspIlellelleCys 
182 TACGGCAAGTTCCTTGCCGACGGCGGGTGCTCGGGGGGCGCT7ATGACATAATAATTTGT 
ATGCCGTTCAAGGAACGGCTGCCGCCCACGAGCCCCCCGCGAATACTGTATTATTAAACA 

AspGluCysHisSerThrAspAlaThrSerlleLeuGlylleGlyThrValLeuAspGln 
242 GACGAGTGCCACTCCACGGATGCCACATCCATCTTGGGCATTGGCACTGTCCTTGACCAA 
CTGCTCACGGTGAGGTGCCTACGGTGTAGGTAGAACCCGTAACCGTGACAGGAACTGGTT 

AlaGluThrAlaGlyAlaArgLeuValValLeuAlaThrAlaThrProProGlySerVal 
302 GCAGAGACTGCGGGGGCGAGACTGGTTGTGCTCGCCACCGCCACCCCTCCGGGCTCCGTC 
CGTCTCTGACGCCCCCGCTCTGACCAACACGAGCGGTGGCGGTGGGGAGGCCCGAGGCAG 

303 ALWN1, 

ThrValProHisProAsnlleGluGluValAlaLeuSerThrThrGlyGluIleProPhe 
362 ACTGTGCCCCATCCCAACATCGAGGAGGTTGCTCTGTCCACCACCGGAGAGATCCCTTTT 
TGACACGGGGTAGGGTTGTAGCTCCTCCAACGAGACAGGTGGTGGCCTCTCTAGGGAAAA 

TyrGlyLysAlalleProLeuGluVallleLysGlyGlyArgHisLeuIlePheCysHis 
422 TACGGCAAGGCTATCCCCCTCGAAGTAATCAAGGGGGGGAGACATCTCATCTTCTGTCAT 
ATGCCGTTCCGATAGGGGGAGCTTCATTAGTTCCCCCCCTCTGTAGAGTAGAAGACAGTA 
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SerLysLysLysCysAspGluLSuAlaAlaLysLeuValAlaLeuGlylleAsnAlaVal 
48? TC AAAGAAGAAGTGCGACGAACTCGCCGCAAAGCTGGTCGCATTGGGCATCAATGCCGTG 
.AGTTTCTTCTTCACGCTGCTTGAGCGGCGTTTCGACCAGCGTAACCCGTAGTTACGGCAC 

AlaTyrTyrArgGlyLeuAspValSerVallleProThrSerGlyAspValValValVal 
542 GCCTACTACCGCGGTCTTGACGTGTCCGTCATCCCGACCAGCGGCGATGTTGTCGTCGTG 
CGGATGATGGCGCCAGAACTGCACAGGCAGTAGGGCTGGTCGCCGCTACAACAGCAGCAC 

A. /\ 

550 SAC2 , 560 DRD1, 

AlaThrAspAlaLeuMetThrGlyTyrThrGlyAspPheAspSerVallleAspCysAsn 
602 GCAACCGATGCCCTCATGACCGGCTATACCGGCGACTTCGACTCGGTGATAGACTGCAAT 
CGTTGGCTACGGGAGTACTGGCCGATATGGCCGCTGAAGCTGAGCGACTATCTGACGTTA 

615 BSPH1, 

ThrCysValThrGlnThrValAspPheSerLeuAspProThrPheThrlleGluThrlle 
662 ACGTGTGTCACCCAGACAGTCGATTTCAGCCTTGACCCTACCTTCACCATTGAGACAATC 
TGCACACAGTGGGTCTGTCAGCTAAAGTCGGAACTGGGATGGAAGTGGTAACTCTGTTAG 

ThrLeuProGlnAspAlaValSerArgThrGlnArgArgGlyArgThrGlyArgGlyLys 
722 ACGCTCCCCCAAGATGCTGTCTCCCGCACTCAACGTCGGGGCAGGACTGGCAGGGGGAAG 
TGCGAGGGGGTTCTACGACAGAGGGCGTGAGTTGCAGCCCCGTCCTGACCGTCCCCCTTC 

ProGlylleTyrArgPheValAlaProGlyGluArgProSerGlyMetPheAspSerSer 
782 CCAGGCATCTACAGATTTGTGGCACCGGGGGAGCGCCCCTCCGGCATGTTCGACTCGTCC 
GGTCCGTAGATGTCTAAACACCGTGGCCCCCTCGCGGGGAGGCCGTACAAGCTGAGCAGG 

816 BGLI, 833 DRDl, 

ValLeuCysGluCysTyrAspAlaGlyCysAlaTrpTyrGluLeuThrProAlaGluThr 
8 4 2 GTCCTCTGTGAGTGCTATGACGCAGGCTGTGCTTGGTATGAGCTCACGCCCGCCGAGACT 
CAGGAGACACTCACGATACTGCGTCCGACACGAACCATACTCGA6TGCGGGCGGCTCTGA 

881 SACI, 

ThrValArgLeuArgAlaTyrMetAsnThrProGlyLeuProValCysGlnAspHisLeu 
902 ACAGTTAGGCTACGAGCGTACATGAACACCCCGGGGCTTCCCGTGTGCCAGGACCATCTT 
TGTCAATCCGATGCTCGCATGTACTTGTGGGGCCCCGAJ^GGGCACACGGTCCTGGTAGAA 

931 SMAI XMAI, 

GluPheTrpGluGlyValPheThrGlyLeuThrHisIleAspAlaHisPheLeuSerGln 
962 GAATTTTGGGAGGGCGTCTTTACAGGCCTCACTCATATAGATGCCCACTTTCTATCCCAG 
CTTAAAACCCTCCCGCAGAAATGTCCGGAGTGAGTATATCTACGGGTGAAAGATAGGGTC 

985 STUI, 

ThrLysGlnSerGlyGluAsnLeuProTyrLeuValAlaTyrGlnAlaThrValCysAla 
1022 ACAAAGCAGAGTGGGGAGAACCTTCCTTACCTGGTAGCGTACCAAGCCACCGTGTGCGCT 
TGTTTCGTCTCACCCCTCTTGGAAGGAATGGACCATCGCATGGTTCGGTGGCACACGCGA 

1069 DRA3, 

ArgAlaGlnAlaProProProSerTrpAspGlnMetTrpLysCysLeuIleArgLeuLys 
1082 AGGGCTCAAGCCCCTCCCCCATCGTGGGACCAGATGTGGAAGTGTTTGATTCGCCTCAAG 



81/100 



WO 01/38360 



PCT/US00/32326 



FIGURE 21 - Page 3 



TCCCGAGTTCGGGGAGGGGGTAGCACCCTGGTCTACACCTTCACAAACTAA'GCGGAGTTC 

ProThrLeuHisGlyProThrProLeuLeuTyrArgLeuGlyAlaValGlnAsnGluIle 
1142 CCCACCCTCCATGGGCCAACACCCCTGCTATACAGACTGGGCGCTGTTCAGAATGAAATC 
GGGTGGGAGGTACCCGGTTGTGGGGACGATATGTCTGACCCGCGACAAGTCTTACTTTAG 

1150 NCOI, 

ThrLeuThrHisProValThrLysTyrlleMetThrCysMetSerAlaAspLeuGluVal 
1202 ACCCTGACGCACCCAGTCACCAAATACATCATGACATGCATGTCGGCCGACCTGGAGGTC 
TGGGACTGCGTGGGTCAGTGGTTTATGTAGTACTGTACGTACAGCCGGCTGGACCTCCAG 

A a a a 

1230 BSPH1, 1234 DRD1, 1237 AVA3, 1245 EAG1 XMA3, 1250 DRD1 , 



ValThrSerThrTrpValLeuValGlyGlyValLeuAlaAlaLeuAlaAlaTyrCysLeu 
1262 GTCACGAGCACCTGGGTGCTCGTTGGCGGCGTCCTGGCTGCTTTGGCCGCGT ATTGCCTG 
CAGTGCTCGTGGACCCACGAGCAACCGCCGCAGGACCGACGAAACCGGCGCATAACGGAC 

SerThrGlyCysValVallleValGlyArgValValLeuSerGlyLysProAlallelle 
1322 TCAACAGGCTGCGTGGTCATAGTGGGCAGGGTCGTCTTGTCCGGGAAGCCGGCAATCATA 
AGTTGTCCGACGCACCAGTATCACCCGTCCCAGCAGAACAGGCCCTTCGGCCGTTAGTAT 

A 

1369 NAEI , 

ProAspArgGluValLeuTyrArgGluPheAspGiuMetGluGluCysSerGlnHisLeu 
1382 CCTGACAGGGAAGTCCTCTACCGAGAGTTCGATGAGATGGAAGAGTGCTCTCAGCACTTA 
GGACTGTCCCTTCAGGAGATGGCTCTCAAGCTACTCTACCTTCTCACGAGAGTCGTGAAT 

A 

1385 DRD1, 

ProTyrlleGIuGlnGlyMetMetLeuAlaGluGlnPheLysGlnLysAlaLeuGlyLeu 
14 4 2 CCGTACATCGAGCAAGGGATGATGCTCGCCGAGCAGTTCAAGCAGAAGGCCCTCGGCCTC 
GGCATGTAGCTCGTTCCCTACTACGAGCGGCTCGTCAAGTTCGTCTTCCGGGAGCCGGAG 

LeuGlnThrAlaSerArgGlnAlaGluVallleAlaProAlaValGlnThrAsnTrpGIn 
1502 CTGCAGACCGCGTCCCGTCAGGCAGAGGTTATCGCCCCTGCTGTCCAGACCAACTGGCAA 
GACGTCTGGCGCAGGGCAGTCCGTCTCCAATAGCGGGGACGACAGGTCTGGTTGACCGTT 

A A 

1502 PSTI, 1507 TTH3I, 

LysLeuGluThrPheTrpAlaLysHisMetTrpAsnPhelleSerGlylleGlnTyrLeu 
r 1562 AAACTCGAGACCTTCTGGGCGAAGCATATGTGGAACTTCATCAGTGGGATACAATACTTG 
TTTGAGCTCTGGAAGACCCGCTTCGTATACACCTTGAAGTAGTCACCCTATGTTATGAAC 

A /S. 

1565 XHOI, 1586 NDEI, 

AlaGlyLeuSerThrLeuProGlyAsnProAlalleAlaSerLeuMetAlaPheThrAla 
1622 GCGGGCTTGTCAACGCTGCCTGGTAACCCCGCCATTGCTTCATTGATGGCTTTTACAGCT 
CGCCCGAACAGTTGCGACGGACCATTGGGGCGGTAACGAAGTAACTACCGAAAATGTCGA 

1643 BSTE2, 1677 ALWN1 PVU2, 

AlaValThrSerProLeuThrThrSerGlnThrLeuLeuPheAsnlleLeuGlyGlyTrp 
1682 GCTGTCACCAGCCCACTAACCACTAGCCAAACCCTCCTCTTCAACATATTGGGGGGGTGG 
CGACAGTGGTCGGGTGATTGGTGATCGGTTTGGGAGGAGAAGTTGTATAACCCCCCCACC 
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ValAlaAlaGlnLeuAlaAlaProGlyAlaAlaThrAlaPheValGlyAlaGlyLeuAIa 
1742 GTGGCTGCCCAGCTCGCCGCCCCCGGTGCCGCTACTGCCTTTGTGGGCGCTGGCTTAGCT 
CACCGACGGGTCGAGCGGCGGGGGCCACGGCGATGACGGAAACACCCGCGACCGAATCGA 

A 

1794 ESP1, 

GlyAlaAlalleGlySerValGlyLeuGlyLysValLeuIleAspIleLeuAlaGlyTyr 
1802 GGCGCCGCCATCGGCAGTGTTGGACTGGGGAAGGTCCTCATAGACATCCTTGCAGGGTAT 
CCGCGGCGGTAGCCGTCACAACCTGACCCCTTCCAGGAGTATCTGTAGGAACGTCCCATA 

1802 KAS1 NARI, 

GlyAlaGlyValAlaGlyAlaLeuValAlaPheLysIleMetSerGlyGluValProSer 
1862 GGCGCGGGCGTGGCGGGAGCTCTTGTGGCATTCAAGATCATGAGCGGTGAGGTCCCCTCC 
CCGCGCCCGCACCGCCCTCGAGAACACCGTAAGTTCTAGTACTCGCCACTCCAGGGGAGG 

/» a 

1876 SACI, 1899 BSPH1, 

ThrGluAspLeuValAsnLeuLeuProAlalleLeuSerProGlyAlaLeuValValGly 
1922 ACGGAGGACCTGGTCAATCTACTGCCCGCCATCCTCTCGCCCGGAGCCCTCGTAGTCGGC 
TGCCTCCTGGACCAGTTAGATGACGGGCGGTAGGAGAGCGGGCCTCGGGAGCATCAGCCG 

A 

1928 TTH3I , 

ValValCysAlaAlalleLeuArgArgHisValGlyProGlyGluGlyAlaValGlnTrp 
1982 GTGGTCTGTGCAGCAATACTGCGCCGGCACGTTGGCCCGGGCGAGGGGGCAGTGCAGTGG 
CACCAGACACGTCGTTATGACGCGGCCGTGCAACCGGGCCCGCTCCCCCGTCACGTCACC 

A A 

2004 NAEI , 2017 SMAI XMAI, 

MetAsnArgLeuIleAlaPheAlaSerArgGlyAsnHisValSerProThrHisTyrVal 
204 2 ATGAACCGGCTGATAGCCTTCGCCTCCCGGGGGAACCATGTTTCCCCCACGCACT ACGTG 
TACTTGGCCGACTATCGGAAGCGGAGGGCCCCCTTGGTACAAAGGGGGTGCGTGATGCAC 

A A 

2067 SMAI XMAI, 2093 DRA3, 

ProGluSerAspAlaAlaAlaArgValThrAlalleLeuSerSerLeuThrValThrGln 
2102 CCGGAGAGCGATGCAGCTGCCCGCGTCACTGCCATACTCAGCAGCCTCACTGTAACCCAG 
GGCCTCTCGCTACGTCGACGGGCGCAGTGACGGTATGAGTCGTCGGAGTGACATTGGGTC 

A A 

2115 PVU2, 2159 ALWN1, 

LeuLeuArgArgLeuHisGlnTrpIleSerSerGluCysThrThrProCysSerGlySer 
2162 CTCCTGAGGCGACTGCACCAGTGGATAAGCTCGGAGTGTACCACTCCATGCTCCGGTTCC 
GAGGACTCCGCTGACGTGGTCACCTATTCGAGCCTCACATGGTGAGGTACGAGGCCAAGG 

A A 

2164 MST2, 2220 EC0N1, 

TrpLeuArgAspIleTrpAspTrpIleCysGluValLeuSerAspPheLysThrTrpLeu 
2222 TGGCTAAGGGACATCTGGGACTGGATATGCGAGGTGTTGAGCGACTTTAAGACCTGGCTA 
ACCGATTCCCTGTAGACCCTGACCTATACGCTCCACAACTCGCTGAAATTCTGGACCGAT 

LysAlaLysLeuMetProGlnLeuProGlylleProPheValSerCysGlnArgGlyTyr 
2282 AAAGCT AAGCTCATGCCACAGCTGCCTGGGATCCCCTTTGTGTCCTGCCAGCGCGGGTAT 
TTTCGATTCGAGTACGGTGTCGACGGACCCTAGGGGAAACACAGGACGGTCGCGCCCATA 

A A A 

2285 ESP1, 2300 PVU2, 2310 BAMHI, 
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LysGlyValTrpArgGlyAspCHylleMetHisThrArgCysHisCysGlyAlaGluIle 
2342 AAGGGGGTCTGGCGAGGGGACGGCATCATGCACACTCGCTGCCACTGTGGAGCTGAGATC 
TTCCCCCAGACCGCTCCCCTGCCGTAGTACGTGTGAGCGACGGTGACACCTCGACTCTAG 

ThrGlyHisValLysAsnGlyThrMetArglleValGlyProArgThrCysArgAsnMet 
2402 ACtGGACATGTCAAAAACGGGACGATGAGGATCGTCGGTCCTAGGACCTGCAGGAACATG 
TGACCTGTACAGTTTTTGCCCTGCTACTCCTAGCAGCCAGGATCCTGGACGTCCTTGTAC 

A A A A 

2425 BSAB1, 2441 AVR2, 2448 SSE83871, 2449 PSTI, 

TrpSerGlyThrPheProIleAsnAlaTyrThrThrGlyProCysThrProLeuProAla 
24 62 TGGAGTGGGACCTTCCCCATTAATGCCTACACCACGGGCCCCTGT ACCCCCCTTCCTGCG 
ACCTCACCCTGGAAGGGGTAATTACGGATGTGGTGCCCGGGGACATGGGGGGAAGGACGC 

A A 

2480 ASE1, 2497 APAI, 

ProAsnTyrThrPheAlaLeuTrpArgValSerAlaGluGluTyrValGluIleArgGln 
2522 CCGAACTACACGTTCGCGCTATGGAGGGTGTCTGCAGAGGAATACGTGGAGATAAGGCAG 
GGCTTGATGTGCAAGCGCGATACCTCCCACAGACGTCTCCTTATGCACCTCTATTCCGTC 

A 

2553 PSTI, 

ValGlyAspPheHisTyrValThrGlyMetThrThrAspAsnLeuLysCysProCysGln 
2582 GTGGGGGACTTCCACTACGTGACGGGTATGACT ACTGACAATCTTAAATGCCCGTGCCAG 
CACCCCCTGAAGGTGATGCACTGCCCATACTGATGACTGTTAGAATTTACGGGCACGGTC 

A 

2594 DRA3, 

ValProSerProGluPhePheThrGiuLeuAspGlyValArgLeuHisArgPheAlaPro 
264 2 GTCCCATCGCCCGAATTTTTCACAGAATTGGACGGGGTGCGCCTACATAGGTTTGCGCCC 
CAGGGTAGCGGGCTTAAAAAGTGTCTTAACCTGCCCCACGCGGATGTATCCAAACGCGGG 

ProCysLysProLeuLeuArgGluGluValSerPheArgValGlyLeuHisGluTyrPro 
2702 CCCTGCAAGCCCTTGCTGCGGGAGGAGGT ATCATTCAG AGTAGGACTCCACGAAT ACCCG 
GGGACGTTCGGGAACGACGCCCTCCTCCATAGTAAGTCTCATCCTGAGGTGCTTATGGGC 

2757 HGIE2, 

ValGlySerGlnLeuProCysGluProGluProAspValAlaValLeuThrSerMetLeu 
27 62 GTAGGGTCGCAATTACCTTGCGAGCCCGAACCGGACGTGGCCGTGTTGACGTCCATGCTC 

CATCCCAGCGTTAATGGAACGCTCGGGCTTGGCCTGCACCGGCACAACTGCAGGTACGAG 

a 

2809 AAT2 , 

ThrAspProSerHisIleThrAlaGluAlaAlaGlyArgArgLeuAlaArgGlySerPro 
2822 ACTGATCCCTCCCATATAACAGCAGAGGCGGCCGGGCGAAGGTTGGCGAGGGGATCACCC 
TGACTAGGGAGGGTATATTGTCGTCTCCGCCGGCCCGCTTCCAACCGCTCCCCTAGTGGG 

A 

2850 EAG1 XMA3, 

ProSerValAlaSerSerSerAlaSerGlnLeuSerAlaProSerLeuLysAlaThrCys 
2882 CCCTCTGTGGCCAGCTCCTCGGCTAGCCAGCTATCCGCTCCATCTCTCAAGGCAACTTGC 
GGGAGACACCGGTCGAGGAGCCGATCGGTCGATAGGCGAGGTAGAGAGTTCCGTTGAACG 

A A 

2889 BALI, 2903 NHEI , 
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ThrAlaAsnHisAspSerProAspAlaGluLeuIleGluAlaAsnLeuLeuTrpArgGin 
2942 ACCGCTAACCATGACTCCCCTGATGCTGAGCTCATAGAGGCCAACCTCCTATGGAGGCAG 
TGGCGATTGGTACTGAGGGGACTACGACTCGAGTATCTCCGGTTGGAGGATACCTCCGTC 

A A 

... 2966 ESP1, 2969 SACI, 

GluMetGlyGlyAsnlleThrArgValGluSerGluAsnLysValVallleLeuAspSer 
3002 GAGATGGGCGGCAACATCACCAGGGTTGAGTCAGAAAACAAAGTGGTGATTCTGGACTCC 
CTCTACCCGCCGTTGTAGTGGTCCCAACTCAGTCTTTTGTTTCACCACTAAGACCTGAGG 

PheAspProLeuValAlaGluGluAspGluArgGluIleSerValProAlaGluIleLeu 
3062 TTCGATCCGCTTGTGGCGGAGGAGGACGAGCGGGAGATCTCCGTACCCGCAGAAATCCTG 
AAGCTAGGCGAACACCGCCTCCTCCTGCTCGCCCTCTAGAGGCATGGGCGTCTTTAGGAC 

3096 BGL2 , 

ArgLysSerArgArgPheAlaGlnAlaLeuProValTrpAlaArgProAspTyrAsnPro 
3122 CGGAAGTCTCGGAGATTCGCCCAGGCCCTGCCCGTTTGGGCGCGGCCGGACTATAACCCC 

GCCTTCAGAGCCTCTAAGCGGGTCCGGGACGGGCAAACCCGCGCCGGCCTGATATTGGGG 

* a 

314 3 ALWN1, 3164 EAG1 XMA3, 

ProLeuValGluThrTrpLysLysProAspTyrGluProProValValHisGlyCysPro 
3182 CCGCTAGTGGAGACGTGGAAAAAGCCCGACTACGAACCACCTGTGGTCCATGGCTGCCCG 
GGCGATCACCTCTGCACCTTTTTCGGGCTGATGCTTGGTGGACACCAGGTACCGACGGGC 

a >\ 

3217 HGIE2, 3229 NCOI, 

LeuProProProLysSerProProValProProProArgLysLysArgThrValValLeu 
324 2 CTTCCACCTCCAAAGTCCCCTCCTGTGCCTCCGCCTCGGAAGAAGCGGACGGTGGTCCTC 
GAAGGTGGAGGTTTCAGGGGAGGACACGGAGGCGGAGCCTTCTTCGCCTGCCACCAGGAG 

ThrGluSerThrLeuSerThrAlaLeuAlaGluLeuAlaThrArgSerPheGlySerSer 
3302 ACTGAATCAACCCTATCTACTGCCTTGGCCGAGCTCGCCACCAGAAGCTTTGGCAGCTCC 
TGACTTAGTTGGGATAGATGACGGAACCGGCTCGAGCGGTGGTCTTCGAAACCGTCGAGG 

A A 

3332 SACI, 3346 HIND3, 

SerThrSerGlylleThrGlyAspAsnThrThrThrSerSerGluProAlaProSerGly 
3362 TCAACTTCCGGCATTACGGGCGACAATACGACAACATCCTCTGAGCCCGCCCCTTCTGGC 
AGTTGAAGGCCGTAATGCCCGCTGTTATGCTGTTGTAGGAGACTCGGGCGGGGAAGACCG 

CysProProAspSerAspAlaGluSerTyrSerSerMetProProLeuGluGlyGluPro 
34 22 TGCCCCCCCGACTCCGACGCTGAGTCCTATTCCTCCATGCCCCCCCTGGAGGGGGAGCCT 
ACGGGGGGGCTGAGGCTGCGACTCAGGATAAGGAGGTACGGGGGGGACCTCCCCCTCGGA 

A 

3437 EAM11051, 

GlyAspProAspLeuSerAspGlySerTrpSerThrValSerSerGluAlaAsnAlaGlu 
34 82 GGGGATCCGGATCTTAGCGACGGGTCATGGTCAACGGTCAGTAGTGAGGCCAACGCGGAG 
CCCCTAGGCCTAGAATCGCTGCCCAGTACCAGTTGCCAGTCATCACTCCGGTTGCGCCTC 

AAA 

3484 BAMHI, 3485 BSAB1, 3487 BSPE1, 

AspValValCysCysSerMetSerTyrSerTrpThrGlyAlaLeuValThrProCysAla 
3542 GATGTCGTGTGCTGCTCAATGTCTTACTCTTGGACAGGCGCACTCGTCACCCCGTGCGCC 
CTACAGCACACGACGAGTTACAGAATGAGAACCTGTCCGCGTGAGCAGTGGGGCACGCGG 
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3589 DRA3, 3600 SAC2, " 

AlaGluGluGlnLysLeuProIleAsnAlaLeuSerAsnSerLeuLeuArgHisHisAsn 
360Z.. GCGGAAGAACAGAAACTGCCCATCAATGCACTAAGCAACTCGTTGCTACGTCACCACAAT 
CGCCTTCTTGTCTTTGACGGGTAGTTACGTGATTCGTTGAGCAACGATGCAGTGGTGTTA 

3611 ALWN1, 3655 PFLM1, 

LeuValTyrSerThrThrSerArgSerAlaCysGlnArgGlnLysLysValThrPheAsp 
3662 TTGGTGTATTCCACCACCTCACGCAGTGCTTGCCAAAGGCAGAAGAAAGTCACATTTGAC 
AACCACATAAGGTGGTGGAGTGCGTCACGAACGGTTTCCGTCTTCTTTCAGTGTAAACTG 

3681 DRA3, 

ArgLeuGlnValLeuAspSerHisTyrGlnAspValLeuLysGluValLysAlaAlaAla 
3722 AGACTGCAAGTTCTGGACAGCCATTACCAGGACGTACTCAAGGAGGTTAAAGCAGCGGCG 
TCTGACGTTCAAGACCTGTCGGTAATGGTCCTGCATGAGTTCCTCCAATTTCGTCGCCGC 

SerLysValLysAlaAsnLeuLeuSerValGluGluAlaCysSerLeuThrProProHis 

37 82 TCAAAAGTGAAGGCTAACTTGCTATCCGTAGAGGAAGCTTGCAGCCTGACGCCCCCACAC 

AGTTTTCACTTCCGATTGAACGATAGGCATCTCCTTCGAACGTCGGACTGCGGGGGTGTG 

3816 HIND3, 

SerAlaLysSerLysPheGlyTyrGlyAlaLysAspValArgCysHisAlaArgLysAla 

38 4 2 TCAGCCAAATCCAAGTTTGGTTATGGGGCAAAAGACGTCCGTTGCCATGCCAGAAAGGCC 

AGTCGGTTTAGGTTCAAACCAATACCCCGTTTTCTGCAGGCAACGGTACGGTCTTTCCGG 

3875 AAT2, 3890 BGLI , 

ValThrHisIleAsnSerValTrpLysAspLeuLeuGluAspAsnValThrProIleAsp 
3902 GTAACCCACATCAACTCCGTGTGGAAAGACCTTCTGGAAGACAATGTAACACCAATAGAC 
CATTGGGTGTAGTTGAGGCACACCTTTCTGGAAGACCTTCTGTTACATTGTGGTTATCTG 

ThrThrlleMetAlaLysAsnGluValPheCysValGlnProGluLysGlyGlyArgLys 
3962 ACTACCATCATGGCTAAGAACGAGGTTTTCTGCGTTCAGCCTGAGAAGGGGGGTCGTAAG 
TGATGGTAGTACCGATTCTTGCTCCAAAAGACGCAAGTCGGACTCTTCCCCCCAGCATTC 

ProAlaArgLeuIleValPheProAspLeuGlyValArgValCysGluLysMetAlaLeu 
4022 CCAGCTCGTCTCATCGTGTTCCCCGATCTGGGCGTGCGCGTGTGCGAAAAGATGGCTTTG 
GGTCGAGCAGAGTAGCACAAGGGGCTAGACCCGCACGCGCACACGCTTTTCTACCGAAAC 

TyrAspValValThrLysLeuProLeuAlaValMetGlySerSerTyrGlyPheGlnTyr 
4 082 TACGACGTGGTTACAAAGCTCCCCTTGGCCGTGATGGGAAGCTCCT ACGGATTCCAATAC 
ATGCTGCACCAATGTTTCGAGGGGAACCGGCACTACCCTTCGAGGATGCCTAAGGTTATG 

SerProGlyGlnArgValGluPheLeuValGlnAlaTrpLysSerLysLysThrProMet 
414 2 TCACCAGGACAGCGGGTTGAATTCCTCGTGCAAGCGTGGAAGTCCAAGAAAACCCCAATG 
AGTGGTCCTGTCGCCCAACTTAAGGAGCACGTTCGCACCTTCAGGTTCTTTTGGGGTTAC 

4160 ECORI, 

GlyPheSerTyrAspThrArgCysPheAspSerThrValThrGluSerAspIleArgThr 
4202 GGGTTCTCGTATGATACCCGCTGCTTTGACTCCACAGTCACTGAGAGCGACATCCGTACG 
CCCAAGAGCATACTATGGGCGACGAAACTGAGGTGTCAGTGACTCTCGCTGTAGGCATGC 
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4229 DRD1, 4236 ALWN1 , 

GluGluAlalleTyrGlnCysCysAspLeuAspProGlnAlaArgValAlalleLysSer 
4262 GAGGAGGCAATCTACCAATGTTGTGACCTCGACCCCCAAGCCCGCGTGGCCATCAAGTCC 
CTCCTCCGTTAGATGGTTACAACACTGGAGCTGGGGGTTCGGGCGCACCGGTAGTTCAGG 

A A 

4 301 BGLI, 4 308 BALI, 

LeuThrGluArgLeuTyrValGlyGlyProLeuThrAsnSerArgGlyGluAsnCysGly 
4 322 CTCACCGAGAGGCTTTATGTTGGGGGCCCTCTTACCAATTCAAGGGGGGAGAACTGCGGC 
GAGTGGCTCTCCGAAATACAACCCCCGGGAGAATGGTTAAGTTCCCCCCTCTTGACGCCG 

4 34 5 APAI, 

TyrArgArgCysArgAlaSerGlyValLeuThrThrSerCysGlyAsnThrLeuThrCys 
4 382 T ATCGCAGGTGCCGCGCGAGCGGCGTACTGACAACTAGCTGTGGTAACACCCTCACTTGC 
ATAGCGTCCACGGCGCGCTCGCCGCATGACTGTTGATCGACACCATTGTGGGAGTGAACG 

TyrlleLysAlaArgAlaAlaCysArgAlaAlaGlyLeuGlnAspCysThrMetLeuVal 
4 4 4 2 TACATCAAGGCCCGGGCAGCCTGTCGAGCCGCAGGGCTCCAGGACTGCACCATGCTCGTG 
ATGTAGTTCCGGGCCCGTCGGACAGCTCGGCGTCCCGAGGTCCTGACGTGGTACGAGCAC 

A 

4452 SMAI XMAI, 

CysGlyAspAspLeuValVallleCysGluSerAlaGlyValGlnGluAspAlaAlaSer 
4 502 TGTGGCGACGACTTAGTCGTTATCTGTGAAAGCGCGGGGGTCCAGGAGGACGCGGCGAGC 
ACACCGCTGCTGAATCAGCAATAGACACTTTCGCGCCCCCAGGTCCTCCTGCGCCGCTCG 

A A 

4508 DRD1, 4511 TTH3I, 

LeuArgAlaPheThrGluAlaMetThrArgTyrSerAlaProProGlyAspProProGln 
4 562 CTGAGAGCCTTCACGGAGGCTATGACCAGGTACTCCGCCCCCCCTGGGGACCCCCCACAA 
GACTCTCGGAAGTGCCTCCGATACTGGTCCATGAGGCGGGGGGGACCCCTGGGGGGTGTT 

ProGluTyrAspLeuGluLeuIleThrSerCysSerSerAsnValSerValAlaHisAsp 
4 622 CCAGAATACGACTTGGAGCTCATAACATCATGCTCCTCCAACGTGTCAGTCGCCCACGAC 
GGTCTTATGCTGAACCTCGAGTATTGTAGTACGAGGAGGTTGCACAGTCAGCGGGTGCTG 

4 637 SACI, — - 

GlyAlaGlyLysArgValTyrTyrLeuThrArgAspProThrThrProLeuAlaArgAla 
4 682 GGCGCTGGAAAGAGGGTCTACTACCTCACCCGTGACCCTACAACCCCCCTCGCG AGAGCT 
CCGCGACCTTTCTCCCAGATGATGGAGTGGGCACTGGGATGTTGGGGGGAGCGCTCTCGA 

4731 NRUI, 

AlaTrpGluThrAlaArgHisThrProValAsnSerTrpLeuGlyAsnllelleMetPhe 
4 7 4 2 GCGTGGGAGACAGCAAGACACACTCCAGTCAATTCCTGGCTAGGCAACATAATCATGTTT 
CGCACCCTCTGTCGTTCTGTGTGAGGTCAGTTAAGGACCGATCCGTTGTATTAGTACAAA 

AlaProThrLeuTrpAlaArgMetlleLeuMetThrHisPhePheSerValLeuIleAla 
4 802 GCCCCCACACTGTGGGCGAGGATGATACTGATGACCCATTTCTTTAGCGTCCTTATAGCC 
CGGGGGTGTGACACCCGCTCCTACTATGACTACTGGGTAAAGAAATCGCAGGAATATCGG 

A A 

4806 PFLM1, 4807 DRA3 , 

ArgAspGlnLeuGluGlnAlaLeuAspCysGluIleTyrGlyAlaCysTyrSerlleGlu 
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4 862 AGGGACCAGCTTGAACAGGCCCTCGATTGCGAGATCTACGGGGCCTGCTACTCCATAGAA 
TCCCTGGTCGAACTTGTCCGGGAGCTAACGCTCTAGATGCCCCGGACGATGAGGTA7CTT 

A 

4893 BGL2, 

' ProLeuAspLeuProProIlelleGlnArgLeuHisGlyLeuSerAlaPheSerLeuHis 
4 922 CCACTGGATCTACCTCCAATCATTCAAAGACTCCATGGCCTCAGCGCATTTTCACTCCAC 
GGTGACCTAGATGGAGGTTAGTAAGTTTCTGAGGTACCGGAGTCGCGTAAAAGTGAGGTG 

A 

4 954 NCOI, 

SerTyrSerProGlyGluIleAsnArgValAlaAlaCysLeuArgLysLeuGlyValPro 
4 982 AGTTACTCTCCAGGTGAAATCAATAGGGTGGCCGCATGCCTCAGAAAACTTGGGG7ACCG 
TCAATGAGAGGTCCACTTTAGTTATCCCACCGGCGTACGGAGTCTTTTGAACCCCATGGC 

A A 

5015 SPHI, 5035 KPNI, 

ProLeuArgAlaTrpArgHisArgAlaArgSerValArgAlaArgLeuLeuAlaArgGly 
5042 CCCTTGCGAGCTTGGAGACACCGGGCCCGGAGCGTCCGCGCTAGGCTTCTGGCCAGAGGA 
GGGAACGCTCGAACCTCTGTGGCCCGGGCCTCGCAGGCGCGATCCGAAGACCGGTCTCCT 

A y\. 

5064 APAI, 5091 BALI, 

GlyArgAlaAlalleCysGlyLysTyrLeuPheAsnTrpAlaValArgThrLysLeuLys 
5102 GGCAGGGCTGCCAT ATGTGGCAAGT ACCTCTTCAACTGGGCAGTAAGAACAAAGCTCAAA 
CCGTCCCGACGGTATACACCGTTCATGGAGAAGTTGACCCGTCATTCTTGTTTCGAGTTT 

A 

5113 NDEI, 

LeuThrProIleAlaAlaAlaGlyGlnLeuAspLeuSerGlyTrpPheThrAlaGlyTyr 
5162 CTCACTCCAATAGCGGCCGCTGGCCAGCTGGACTTGTCCGGCTGGTTCACGGCTGGCTAC 
GAGTGAGGTTATCGCCGGCGACCGGTCGACCTGAACAGGCCGACCAAGTGCCGACCGATG 

5174 NOT I, 5175 EAG1 XMA3, 5182 BALI , 5186 PVU2, 

SerGlyGlyAspIleTyrHisSerValSerHisAlaArgProArgTrpIleTrpPheCys 
5222 AGCGGGGGAGACATTTATCACAGCGTGTCTCATGCCCGGCCCCGCTGGATCTGGTTTTGC 
TCGCCCCCTCTGTAAATAGTGTCGCACAGAGTACGGGCCGGGGCGACCTAGACCAAAACG 

A 

5240 DRA3, 

LeuLeuLeuLeuAlaAlaGlyValGlylleTyrLeuLeuProAsnArgMetSerThrAsn 
^ 5282 CTACTCCTGCTTGCTGCAGGGGTAGGCATCTACCTCCTCCCCAACCGAATGAGCACGAAT 
GATGAGGACGAACGACGTCCCCATCCGTAGATGGAGGAGGGGTTGGCTTACTCGTGCTTA 

A 

5295 PSTI, 

ProLysProGlnArgLysThrLysArgAsnThrAsnArgArgProGlnAspValLysPhe 
5342 CCTAAACCTCAAAGAAAGACCAAACGTAACACCAACCGGCGGCCGCAGGACGTCAAGTTC 
GGATTTGGAGTTTCTTTCTGGTTTGCATTGTGGTTGGCCGCCGGCGTCCTGCAGTTCAAG 

A /\ A /\ 

• 5380 NOT I, 5381 EAG1 XMA3, 5390 AAT2, 5401 SMAI XMAI, 

ProGlyGlyGlyGlnlieValGlyGlyValTyrLeuLeuProArgArgGlyProArgLeu 
54 02 CCGGGTGGCGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGC AGGGGCCCTAGATTG 
GGCCCACCGCCAGTCTAGCAACCACCTCAAATGAACAACGGCGCGTCCCCGGGATCTAAC 
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54 4 9 APAI, 

GlyValArgAlaThrArgLysThrSerGluArgSerGlnProArgGlyArgArgGinPro 
54 62 GGTGTGCGCGCGACGAGAAAGACTTCCGAGCGGTCGCAACCTCGAGGTAGACGTCAGCCT 
CCACACGCGCGCTGCTCTTTCTGAAGGCTCGCCAGCGTTGGAGCTCCATCTGCAGTCGGA 

5467 BSSH2, 5478 XMNI, 5502 XHOI, 5511 AAT2, 

IleProLysAlaArgArgProGluGlyArgThrTrpAlaGlnProGlyTyrProTrpPro 
5522 ATCCCCAAGGCTCGTCGGCCCGAGGGCAGGACCTGGGCTCAGCCCGGGTACCCTTGGCCC 
TAGGGGTTCCGAGCAGCCGGGCTCCCGTCCTGGACCCGAGTCGGGCCCATGGGAACCGGG 

5548 ALWN1, 5558 ESP1, 5564 SMAI XMAI, 5568 KPNI, 

LeuTyrGlyAsnGluGlyCysGlyTrpAlaGlyTrpLeuLeuSerProArgGlySerArg 
5582 CTCTATGGCAATGAGGGCTGCGGGTGGGCGGGATGGCTCCTGTCTCCCCGTGGCTCTCGG 
GAGATACCGTTACTCCCGACGCCCACCCGCCCTACCGAGGACAGAGGGGCACCGAGAGCC 

ProSerTrpGlyProThrAspProArgArgArgSerArgAsnLeuGlyLysVallleAsp 
5642 CCTAGCTGGGGCCCCACAGACCCCCGGCGTAGGTCGCGCAATTTGGGTAAGGTCATCGAT 
GGATCGACCCCGGGGTGTCTGGGGGCCGCATCCAGCGCGTTAAACCCATTCCAGTAGCTA 

5650 APAI, 5696 CLAI, 

ThrLeuThrCysGlyPheAlaAspLeuMetGlyTyrlleProLeuValOC AM 
57 02 ACCCTTACGTGCGGCTTCGCCGACCTCATGGGGTACATACCGCTCGTCTAATAGTCGAC 
TGGGAATGCACGCCGAAGCGGCTGGAGTACCCCATGTATGGCGAGCAGATTATCAGCTG 

A /s 

5724 HGIE2, 5755 SALI , 
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MetAlaAlaTyrAlaAiaGlnGlyTyrLysValLeuValleuAsn 
2 AGC7TACAAAACAAAATGGCTGCATATGCAGCTCAGGGCTA7AAGGTGCTAGTACTCAAC 
TCGAATGTTTTGTTTTACCGACGTATACGTCGAGTCCCGATATTCCACGATCATGAGTTG 

1 HIND3, 24 NDEI, 52 SCAI, 

ProSerValAlaAlaThrLeuGlyPheGlyAlaTyrMetSerLysAlaHisGlylleAsp 
52 CCCTCTGTTGCTGCAACACTGGGCTTTGGTGCTTACATGTCCAAGGCTCATGGGATCGAT 
GGGAGACAACGACGTTGTGACCCGAAACCACGAATGTACAGGTTCCGAGTACCCTAGCTA 

116 CLAI, 

ProAsnlleArgThrGlyValArgThrlleThrThrGlySerProileThrTyrSerThr 

1 2 2 CCTAACATCAGGACCGGGGTG AGAACAATTACC ACTGGCAGCCCCATCACGTACTCCACC 

GGA7TGTAGTCCTGGCCCCACTCTTGTTAATGGTGACCGTCGGGGTAGTGCATGAGGTGG 

TyrGlyLysPheLeuAlaAspGlyGlyCysSerGlyGlyAlaTyrAspIlellelleCys 

13 2 TACGGCAAGTTCCTTGCCGACGGCGGGTGCTCGGGGGGCGCTTATGACATAATAATTTGT 

ATGCCGTTCAAGGAACGGCTGCCGCCCACGAGCCCCCCGCGAATACTGTATTAT TAAACA 

AspGluCysHisSerThrAspAlaThrSerlleLeuGlylleGiythrValLeuAspGin 

2 4 2 GACGAGTGCCACTCCACGGATGCCACATCCATCTTGGGCATTGGCACTGTCCTTGACCAA 

CTGCTCACGGTGAGGTGCCTACGGTGTAGGTAGAACCCGTAACCGTGACAGGAACTGGTT 

AlaGluThrAlaGlyAlaArgLeuValValLeuAlaThrAlaThrProProGlySerVal 

3 Q 2 GCAGAGACTGCGGGGGCGAGACTGGTTGTGCTCGCCACCGCCACCCCTCCGGGCTCCGTC 

CGTCTCTGACGCCCCCGCTCTGACCAACACGAGCGGTGGCGGTGGGGAGGCCCGAGGCAG 

303 ALWN1, 

ThrVaiProHisProAsnlieGluGluValAlaLeuSerThrThrGlyGluIleProPhe 
362 ACTGTGCCCCATCCCAACATCGAGGAGGTTGCTCTGTCCACCACCGGAGAGATCCCTTTT 
TGACACGGGGTAGGGTTGTAGCTCCTCCAACGAGACAGGTGGTGGCCTCTCTAGGGAAAA 

TyrGlyLysAlalleProLeuGluVaLIleLysGlyGlyArgHisLeuIlePheCysHis 

4 2 2 TACGGCAAGGCTATCCCCCTCGAAGTAATCAAGGGGGGGAGACATCTCATCTTCTGTCAT 

ATGCCGTTCCGATAGGGGGAGCTTCATTAGTTCCCCCCCTCTGTAGAGTAGAAGACAGTA 
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SerLysLysLysCysAspGiuLeuAlaAlaLysLeuValAlaLeuGlylleAsnAlaVai 
482 TCAAAGAAGAAGTGCGACGAACTCGCCGCAAAGCTGGTCGCATTGGGCATCAATGCCGTG 
AGTTTCTTCTTCACGCTGCTTGAGCGGCGTTTCGACCAGCGTAACCCGTAGTTACGGCAC 

AlaTyrTyrArgGlyLeuAspValSerVallleProThrSerGlyAspValValVaiVal 
542 GCCT ACTACCGCGGTCTTGACGTGTCCGTCATCCCGACCAGCGGCGATGTTGTCGTCGTG 
CGGATGATGGCGCCAGAACTGCACAGGCAGTAGGGCTGGTCGCCGCTACAACAGCAGCAC 

A A 

550 SAC2, 560 DRD1, 

AlaThrAspAlaLeuMetThrGlyTyrThrGlyAspPheAspSerVallleAspCysAsn 
602 GCAACCGATGCCCTCATGACCGGCTATACCGGCGACTTCGACTCGGTGATAGACTGCAAT 
CGT.TGGCTACGGGAGTACTGGCCGATATGGCCGCTGAAGCTGAGCCACTATCTGACGTTA 
a 

615 BSPH1, 

ThrCysValThrGlnThrValAspPheSerLeuAspProThrPheThrlleGluThrlle 
662 ACGTGTGTCACCCAGACAGTCGATTTCAGCCTTGACCCTACCTTCACCATTGAGACAATC 
TGCACACAGTGGGTCTGTCAGCTAAAGTCGGAACTGGGATGGAAGTGGTAACTCTGTTAG 

ThrLeuProGlnAspAlaValSerArgThrGlnArgArgGlyArgThrGlyArgGlyLys 
722 ACGCTCCCCCAAGATGCTGTCTCCCGCACTCAACGTCGGGGCAGGACTGGCAGGGGGAAG 
TGCGAGGGGGTTCTACGACAGAGGGCGTGAGTTGCAGCCCCGTCCTGACCGTCCCCCTTC 

ProGlylleTyrArgPheValAlaProGlyGluArgProSerGlyMetPheAspSerSer 
782 CCAGGCATCTACAGATTTGTGGC ACCGGGGGAGCGCCCCTCCGGCATGTTCGACTCGTCC 
GGTCCGTAGATGTCTAAACACCGTGGCCCCCTCGCGGGGAGGCCGTACAAGCTGAGCAGG 

a 

816 BGLI, 833 DRD1, 

ValLeuCysGluCysTyrAspAlaGlyCysAlaTrpTyrGluLeuThrProAlaGluThr 
8 42 GTCCTCTGTGAGTGCTATGACGCAGGCTGTGCTTGGTATGAGCTCACGCCCGCCGAGACT 
CAGGAGACACTCACGATACTGCGTCCGACACGAACCATACTCGAGTGCGGGCGGCTCTGA 

881 SACI, 

ThrValArgLeuArgAlaTyrMetAsnThrProGlyLeuProValCysGlnAspHisLeu 
902 ACAGTTAGGCTACGAGCGTACATGAACACCCCGGGGCTTCCCGTGTGCCAGGACCATCTT 
TGTCAATCCGATGCTCGCATGTACTTGTGGGGCCCCGAAGGGCACACGGTCCTGGTAGAA 

a 

931 SMAI XMAI, 

GluPheTrpGluGlyValPheThrGlyLeuThrHisIleAspAlaHisPheLeuSerGln 
962 GAATTTTGGGAGGGCGTCTTTACAGGCCTCACTCATATAGATGCCCACTTTCTATCCCAG 
CTTAAAACCCTCCCGCAGAAATGTCCGGAGTGAGTATATCTACGGGTGAAAGATAGGGTC 

A 

985 STUI, 

ThrLysGlnSerGlyGluAsnLeuProTyrLeuValAlaTyrGlnAlaThrValCysAla 
1022 ACAAAGCAGAGTGGGGAGAACCTTCCTTACCTGGTAGCGTACCAAGCCACCGTGTGCGCT 
TGTTTCGTCTCACCCCTCTTGGAAGGAATGGACCATCGCATGGTTCGGTGGCACACGCGA 

A 

1069 DRA3, 

ArgAlaGlnAlaProProProSerTrpAspGlnMetTrpLysCysLeuIleArgLeuLys 
1082 AGGGCTCAAGCCCCTCCCCCATCGTGGGACCAGATGTGGAAGTGTTTGATTCGCCTCAAG 
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TCCCGAGTTCGGGGAGGGGGTAGCACCCTGGTCTACACCTTCACAAACTAAGCGGAG??C 

ProThrLeuHisGlyProThrt>roLeuLeuTyrArgLeuGlyAlaValGlnAsnGluIie 
114 2 CCCACCCTCCATGGGCCAACACCCCTGCTATACAGACTGGGCGCTGTTCAGAATGAAATC 
■-'GGGTGGGAGGTACCCGGTTGTGGGGACGATATGTCTGACCCGCGACAAGTCTTACTTTAG 
a 

1150 NCOI, 

ThrLeuThrHisProValThrLysTyrlleMetThrCysMetSerAlaAspLeuGluVal 
1202 ACCCTGACGCACCCAGTCACCAAATACATCATGACATGCATGTCGGCCGACCTGGAGGTC 
TGGGACTGCGTGGGTCAGTGGTTTATGTAGTACTGTACGTACAGCCGGCTGGACCTCCAG 

A A A * A 

1230 BSPH1, 1234 DRD1 , 1237 AVA3 , 1245 EAG1 XMA3, 1250 DRD1, 



ValThrSerThrTrpValLeuValGlyGlyValLeuAlaAlaLeuAlaAlaTyrCysLeu 
1262 GTCACGAGCACCTGGGTGCTCGTTGGCGGCGTCCTGGCTGCTTTGGCCGCGTATTGCCTG 
CAGTGCTCGTGGACCCACGAGCAACCGCCGCAGGACCGACGAAACCGGCGCATAACGGAC 

SerThrGlyCysValVallleValGlyArgValValLeuSerGlyLysProAlallelle 
1322 TCAAC AGGCTGCGTGGTCATAGTGGGCAGGGTCGTCTTGTCCGGGAAGCCGGCAATCAT A 
AGTTGTCCGACGCACCAGTATCACCCGTCCCAGCAGAACAGGCCCTTCGGCCGTTAGTAT 

A 

1369 NAEI, 

ProAspArgGluValLeuTyrArgGluPheAspGluMetGluGluCysSerGlnHisLeu 
1382 CCTGACAGGGAAGTCCTCT ACCGAGAGTTCGATGAGATGGAAGAGTGCTCTCAGCACTTA 
GGACTGTCCCTTCAGGAGATGGCTCTCAAGCTACTCTACCTTCTCACGAGAGTCGTGAAT 

A 

1385 DRD1, 

ProTyrlleGluGlnGlyMetMetLeuAlaGluGlnPheLysGlnLysAlaLeuGlyLeu 
14 4 2 CCGTACATCGAGCAAGGGATGATGCTCGCCGAGCAGTTCAAGCAGAAGGCCCTCGGCCTC 
GGCATGTAGCTCGTTCCCTACTACGAGCGGCTCGTCAAGTTCGTCTTCCGGGAGCCGGAG 

LeuGlnThrAlaSerArgGlnAlaGluVallleAlaProAlaValGlnThrAsnTrpGln 
1502 CTGCAGACCGCGTCCCGTCAGGCAGAGGTTATCGCCCCTGCTGTCCAGACCAACTGGCAA 
GACGTCTGGCGCAGGGCAGTCCGTCTCCAATAGCGGGGACGACAGGTCTGGTTGACCGTT 

A a 

1502 PSTI, 1507 TTH3I, 

LysLeuGluThrPheTrpAlaLysHisMetTrpAsnPhelleSerGiylleGlnTyrLeu 
. * 1562 AAACTCGAGACCTTCTGGGCGAAGCATATGTGGAACTTCATCAGTGGGATACAATACTTG 
TTTGAGCTCTGGAAGACCCGCTTCGTATACACCTTGAAGTAGTCACCCTATGTTATGAAC 

A A 

1565 XHOI, 1586 NDEI , 

AlaGlyLeuSerThrLeuProGlyAsnProAlalleAlaSerLeuMetAlaPheThrAla 
1622 GCGGGCTTGTCAACGCTGCCTGGTAACCCCGCCATTGCTTCATTGATGGCTTTTACAGCT 
CGCCCGAACAGTTGCGACGGACCATTGGGGCGGTAACGAAGTAACTACCGAAAATGTCGA 

A A 

1643 BSTE2, 1677 ALWN1 PVU2, 

AlaValThrSerProLeuThrThrSerGlnThrLeuLeuPheAsnlleL uGlyGlyTrp 
1682 GCTGTCACCAGCCCACTAACCACTAGCCAAACCCTCCTCTTCAACATATTGGGGGGGTGG 
CGACAGTGGTCGGGTGATTGGTGATCGGTTTGGGAGGAGAAGTTGTATAACCCCCCCACC 
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ValAlaAlaGlnLeuAlaAlaProGlyAlaAlaThrAlaPheValGlyAlaGlyLeuAla 
174 2 GTGGCTGCCCAGCTCGCCGCCCCCGGTGCCGCTACTGCCTTTGTGGGCGCTGGCTTAGCT 
CACCGACGGGTCGAGCGGCGGGGGCCACGGCGATGACGGAAACACCCGCGACCGAATCGA 

'1794 ESP1, 

GlyAlaAlalleGlySerValGlyLeuGlyLysValLeuIleAspIleLeuAlaGlyTyr 
1802 GGCGCCGCCATCGGCAGTGTTGGACTGGGGAAGGTCCTCATAGACATCCTTGCAGGGTAT 
CCGCGGCGGTAGCCGTCACAACCTGACCCCTTCCAGGAGTATCTGTAGGAACGTCCCATA 

1802 KAS1 NARI, 

GlyAlaGlyValAlaGlyAlaLeuValAlaPheLysIleMetSerGlyGluValProSer 
1862 GGGGCGGGCGTGGCGGGAGCTCTTGTGGCATTCAAGATCATGAGGGGTGAGGTCCCCTCC 
CCGCGCCCGCACCGCCCTCGAGAACACCGTAAGTTCTAGTACTCGCCACTCCAGGGGAGG 

1878 SACI, 1899 BSPH1, 

ThrGluAspLeuValAsnLeuLeuProAlalleLeuSerProGlyAlaLeuValValGly 
1922 ACGGAGGACCTGGTCAATCTACTGCCCGCCATCCTCTCGCCCGGAGCCCTCGTAGTCGGC 
TGCCTCCTGGACCAGTTAGATGACGGGCGGTAGGAGAGCGGGCCTCGGGAGCATCAGCCG 

1928 TTH3I, 

ValValCysAlaAlalleLeuArgArgHisValGlyProGlyGluGlyAlaValGlnTrp 
1982 GTGGTCTGTGCAGCAATACTGCGCCGGCACGTTGGCCCGGGCGAGGGGGCAGTGCAGTGG 
CACCAGACACGTCGTTATGACGCGGCCGTGCAACCGGGCCCGCTCCCCCGTCACGTCACC 

2004 NAEI, 2017 SMAI XMAI, 

MetAsnArgLeuIleAlaPheAlaSerArgGlyAsnHisValSerProThrHisTyrVal 
204 2 ATGAACCGGCTGATAGCCTTCGCCTCCCGGGGGAACC ATGTTTCCCCCACGCACTACGTG 
TACTTGGCCGACTATCGGAAGCGGAGGGCCCCCTTGGTACAAAGGGGGTGCGTGATGCAC 

2067 SMAI XMAI, 2093 DRA3, 

ProGluSerAspAlaAlaAlaArgValThrAlalleLeuSerSerLeuThrValThrGln 
2102 CCGGAGAGCGATGCAGCTGCCCGCGTCACTGCCATACTCAGCAGCCTCACTGTAACCCAG 
GGCCTCTCGCTACGTCGACGGGCGCAGTGACGGTATGAGTCGTCGGAGTGACATTGGGTC 

2115 PVU2, 2159 ALWN1, 

LeuLeuArgArgLeuHisGlnTrpIleSerSerGluCysThrThrProCysSerGlySer 
2162 CTCCTGAGGCGACTGCACCAGTGGATAAGCTCGGAGTGTACCACTCCATGCTCCGGTTCC 
GAGGACTCCGCTGACGTGGTCACCTATTCGAGCCTCACATGGTGAGGTACGAGGCCAAGG 

2164 MST2, 2220 ECON1, 

TrpLeuArgAspIIeTrpAspTrpIleCysGluValLeuSerAspPheLysThrTrpLeu 
2222 TGGCTAAGGGACATCTGGGACTGGATATGCGAGGTGTTGAGCGACTTTAAGACCTGGCTA 
. ACCGATTCCCTGTAGACCCTGACCTATACGCTCCACAACTCGCTGAAATTCTGGACCGAT 

LysAlaLysLeuMetProGlnLeuProGlylleProPheValSerCysGlnArgGlyTyr 
2282 AAAGCTAAGCTCATGCCACAGCTGCCTGGGATCCCCTTTGTGTCCTGCCAGCGCGGGTAT 
TTTCGATTCGAGTACGGTGTCGACGGACCCTAGGGGAAACACAGGACGGTCGCGCCCATA 

2285 ESP1, 2300 PVU2, 2310 BAMHI, 
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LysGlyValTrpArgGlyAspGlylleMetHisThrArgCysHisCysGlyAlaGluIle 
2342 AAGGGGGTCTGGCGAGGGGACGGCATCATGCACACTCGCTGCCACTGTGGAGCTGAGATC 
TTCCCCCAGACCGCTCCCCTGCCGTAGTACGTGTGAGCGACGGTGACACCTCGACTCTAG 

ThrGlyHisValLysAsnGlyThrMetArglleValGlyProArgThrCysArgAsnMet 
24 02 ACTGGACATGTCAAAAACGGGACGATGAGGATCGTCGGTCCTAGGACCTGCAGGAACATG 
TGACCTGTACAGTTTTTGCCCTGCTACTCCTAGCAGCCAGGATCCTGGACGTCCTTGTAC 

A AAA 

2425 BSAB1, 2441 AVR2, 2448 SSE83871, 2449 PSTI, 

TrpSerGlyThrPheProIleAsnAlaTyrThrThrGlyProCysThrProLeuProAla 
24 62 TGGAGTGGGACCTTCCCCATTAATGCCTACACCACGGGCCCCTGTACCCCCCTTCCTGCG 
ACCTCACCCTGGAAGGGGTAATTACGGATGTGGTGCCCGGGGACATGGGGGGAAGGACGC 

A ^ 

2480 ASE1, 2497 APAI, 

ProAsnTyrThrPheAlaLeuTrpArgValSerAlaGluGluTyrValGluIleArgGln 
2522 CCGAACT ACACGTTCGCGCTATGGAGGGTGTCTGCAGAGGAATACGTGGAGATAAGGCAG 
GGCTTGATGTGCAAGCGCGATACCTCCCACAGACGTCTCCTTATGCACCTCTATTCCGTC 

A 

2553 PSTI, 

ValGlyAspPheHisTyrValThrGlyMetThrThrAspAsnLeuLysCysProCysGln 
2582 GTGGGGGACTTCCACTACGTGACGGGTATGACTACTGACAATCTTAAATGCCCGTGCCAG 
CACCCCCTGAAGGTGATGCACTGCCCATACTGATGACTGTTAGAATTTACGGGCACGGTC 

2594 DRA3, 

ValProSerProGluPhePheThrGluLeuAspGlyValArgLeuHisArgPheAlaPro 
264 2 GTCCCATCGCCCGAATTTTTCACAGAATTGGACGGGGTGCGCCTACATAGGTTTGCGCCC 
CAGGGTAGCGGGCTTAAAAAGTGTCTTAACCTGCCCCACGCGGATGTATCCAAACGCGGG 

ProCysLysProLeuLeuArgGluGluValSerPheArgValGlyLeuHisGluTyrPro 
2702 CCCTGCAAGCCCTTGCTGCGGGAGGAGGTATCATTCAGAGTAGGACTCCACGAATACCCG 
GGGACGTTCGGGAACGACGCCCTCCTCCATAGTAAGTCTCATCCTGAGGTGCTTATGGGC 

2757 HGIE2, 

ValGlySerGlnLeuProCysGluProGluProAspValAlaValLeuThrSerMetLeu 
27 62 GTAGGGTCGCAATTACCTTGCGAGCCCGAACCGGACGTGGCCGTGTTGACGTCCATGCTC 
CATCCCAGCGTTAATGGAACGCTCGGGCTTGGCCTGCACCGGCACAACTGCAGGTACGAG 

A 

2809 AAT2, 

ThrAspProSerHisIleThrAlaGluAlaAlaGlyArgArgLeuAlaArgGlySerPro 
2822 ACTGATCCCTCCCATATAACAGCAGAGGCGGCCGGGCGAAGGTTGGCGAGGGGATCACCC 
TGACTAGGGAGGGTATATTGTCGTCTCCGCCGGCCCGCTTCCAACCGCTCCCCTAGTGGG 

A 

2850 EAG1 XMA3, 

ProSerValAlaSerSerSerAlaSerGlnLeuSerAlaProSerLeuLysAlaThrCys 
2882 CCCTCTGTGGCCAGCTCCTCGGCTAGCCAGCTATCCGCTCCATCTCTCAAGGCAACTTGC 
GGGAGACACCGGTCGAGGAGCCGATCGGTCGATAGGCGAGGTAGAGAGTTCCGTTGAACG 

A A 

2889 BALI , 2903 NHEI, 
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ThrAlaAsnHisAspSerProAspAlaGluL uIleGluAlaAsnLeuLeuTrpAtgGin 
2942 ACCGCTAACCATGACTCCCCTGATGCTGAGCTCATAGAGGCCAACCTCCTATGGAGGCAG 
TGGCGATTGGTACTGAGGGGACTACGACTCGAGTATCTCCGGTTGGAGGATACCTCCGTC 

A A 

.. 2966 ESP1, 2969 SACI, 

GluMetGlyGlyAsnlleThrArgValGluSerGluAsnLysValVallleLeuAspSer 
3002 GAGATGGGCGGCAACATCACCAGGGTTGAGTCAGAAAACAAAGTGGTGATTCTGGACTCC 
CTCTACCCGCCGTTGTAGTGGTCCCAACTCAGTCTTTTGTTTCACCACTAAGACCTGAGG 

PheAspProLeuValAlaGluGluAspGluArgGluIleSerValProAlaGluIleLeu 
3062 TTCGATCCGCTTGTGGCGGAGGAGGACGAGCGGGAGATCTCCGTACCCGCAGAAATCCTG 
AAGCTAGGCGAACACCGCCTCCTCCTGCTCGCCCTCTAGAGGCATGGGCGTCTTTAGGAC 

/\ 

3096 BGL2 , 

ArgLysSerArgArgPheAlaGlnAlaLeuProValTrpAlaArgProAspTyrAsnPro 
3122 CGGAAGTCTCGGAGATTCGCCCAGGCCCTGCCCGTTTGGGCGCGGCCGGACTATAACCCC 
GCCTTCAGAGCCTCTAAGCGGGTCCGGGACGGGCAAACCCGCGCCGGCCTGATATTGGGG 

A 

3143 ALWN1, 3164 EAG1 XMA3, 

ProLeuValGluThrTrpLysLysProAspTyrGluProProValValHisGlyCysPro 
3182 CCGCTAGTGGAGACGTGGAAAAAGCCCGACTACGAACCACCTGTGGTCCATGGCTGCCCG 
GGCGATCACCTCTGCACCTTTTTCGGGCTGATGCTTGGTGGACACCAGGTACCGACGGGC 

A A 

3217 HGIE2, 3229 NCOI, 

LeuProProProLysSerProProValProProProArgLysLysArgThrValValLeu 
3242 CTTCCACCTCC AAAGTCCCCTCCTGTGCCTCCGCCTCGGAAGAAGCGGACGGTGGTCCTC 
GAAGGTGGAGGTTTCAGGGGAGGACACGGAGGCGGAGCCTTCTTCGCCTGCCACCAGGAG 

ThrGluSerThrLeuSerThrAlaLeuAlaGluLeuAlaThrArgSerPheGlySerSer 
3302 ACTGAATCAACCCTATCTACTGCCTTGGCCGAGCTCGCCACCAGAAGCTTTGGCAGCTCC 
TGACTTAGTTGGGATAGATGACGGAACCGGCTCGAGCGGTGGTCTTCGAAACCGTCGAGG 

A A 

3332 SACI , 3346 HIND3, 

SerThrSerGlylleThrGlyAspAsnThrThrThrSerSerGluProAlaProSerGly 
3362 TCAACTTCCGGCATTACGGGCGACAATACGACAACATCCTCTGAGCCCGCCCCTTCTGGC 
AGTTGAAGGCCGTAATGCCCGCTGTTATGCTGTTGTAGGAGACTCGGGCGGGGAAGACCG 

CysProProAspSerAspAlaGluSerTyrSerSerMetProProLeuGluGlyGluPro 
' 3422 TGCCCCCCCGACTCCGACGCTGAGTCCTATTCCTCCATGCCCCCCCTGGAGGGGGAGCCT 
ACGGGGGGGCTGAGGCTGCGACTCAGGATAAGGAGGTACGGGGGGGACCTCCCCCTCGGA 

A 

3437 EAM11051, 

GlyAspProAspLeuSerAspGlySerTrpSerThrValSerSerGluAlaAsnAlaGlu 
34 82 GGGGATCCGGATCTTAGCGACGGGTCATGGTCAACGGTCAGTAGTGAGGCCAACGCGGAG 
CCCCTAGGCCTAGAATCGCTGCCCAGTACCAGTTGCCAGTCATCACTCCGGTTGCGCCTC 

A A A 

3484 BAMHI, 3485 BSAB1, 3487 BSPE1, 

AspValValCysCysSerMetSerTyrSerTrpThrGlyAlaLeuValThrProCysAla 
3542 GATGTCGTGTGCTGCTCAATGTCTTACTCTTGGACAGGCGCACTCGTCACCCCGTGCGCC 
CTACAGCACACGACGAGTTACAGAATGAGAACCTGTCCGCGTGAGCAGTGGGGCACGCGG 
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3589 DRA3, 3600 SAC2, - 

AlaGluGluGlnLysLeuProIleAsnAiaLeuSerAsnSerLeuLeuArgHisHisAsn 
3602 ... GCGGAAGAACAGAAACTGCCCATCAATGCACTAAGCAACTCGTTGCTACGTCACCACAAT 
CGCCTTCTTGTCTTTGACGGGTAGTTACGTGATTCGTTGAGCAACGATGCAGTGGTGTTA 

3611 ALWN1, 3655 PFLM1, 

LeuValTyrSerThrThrSerArgSerAlaCysGlnArgGlnLysLysValThrPheAsp 
3662 TTGGTGTATTCCACCACCTCACGCAGTGCTTGCCAAAGGCAGAAGAAAGTCACATTTGAC 
AACCACATAAGGTGGTGGAGTGCGTCACGAACGGTTTCCGTCTTCTTTCAGTGTAAACTG 

3681 DRA3, 

ArgLeuGlnValLeuAspSerHisTyrGlnAspValLeuLysGluValLysAlaAlaAla 
3722 AGACTGCAAGTTCTGGACAGCCATTACCAGGACGTACTCAAGGAGGTTAAAGCAGCGGCG 
TCTGACGTTCAAGACCTGTCGGTAATGGTCCTGCATGAGTTCCTCCAATTTCGTCGCCGC 

SerLysValLysAlaAsnLeuLeuSerValGluGluAlaCysSerLeuThrProProHis 
3782 TCAAAAGTGAAGGCTAACTTGCTATCCGTAGAGGAAGCTTGCAGCCTGACGCCCCCACAC 
AGTTTTCACTTCCGATTGAACGATAGGCATCTCCTTCGAACGTCGGACTGCGGGGGTGTG 

3816 HIND3, 

SerAlaLysSerLysPheGlyTyrGlyAlaLysAspValArgCysHisAlaArgLysAla 
3842 TCAGCCAAATCCAAGTTTGGTTATGGGGCAAAAGACGTCCGTTGCCATGCCAGAAAGGCC 
AGTCGGTTTAGGTTCAAACCAATACCCCGTTTTCTGCAGGCAACGGTACGGTCTTTCCGG 

3875 AAT2, 3890 BGLI, 

ValThrHisIleAsnSerValTrpLysAspLeuLeuGluAspAsnValThrProIleAsp 
3902 GTAACCCACATCAACTCCGTGTGGAAAGACCTTCTGGAAGACAATGTAACACCAAT AGAC 
CATTGGGTGTAGTTGAGGCACACCTTTCTGGAAGACCTTCTGTTACATTGTGGTTATCTG 

ThrThrlleMetAlaLysAsnGluValPheCysValGlnProGluLysGlyGlyArgLys 
3962 ACTACCATCATGGCTAAGAACGAGGTTTTCTGCGTTCAGCCTGAGAAGGGGGGTCGTAAG 
TGATGGTAGTACCGATTCTTGCTCCAAAAGACGCAAGTCGGACTCTTCGCCCCAGCATTC 

ProAlaArgLeuIleValPheProAspLeuGlyValArgValCysGluLysMetAlaLeu 
4022 CCAGCTCGTCTCATCGTGTTCCCCGATCTGGGCGTGCGCGTGTGCGAAAAGATGGCTTTG 
GGTCGAGCAGAGTAGCACAAGGGGCTAGACCCGCACGCGCACACGCTTTTCTACCGAAAC 

TyrAspValValThrLysLeuProLeuAlaValMetGlySerSerTyrGlyPheGlnTyr 
4 082 TACGACGTGGTTACAAAGCTCCCCTTGGCCGTGATGGGAAGCTCCTACGGATTCCAATAC 
ATGCTGCACCAATGTTTCGAGGGGAACCGGCACTACCCTTCGAGGATGCCTAAGGTTATG 

SerProGlyGlnArgValGluPheLeuValGlnAlaTrpLysSerLysLysThrProMet 
4142 TCACCAGGACAGCGGGTTGAATTCCTCGTGCAAGCGTGGAAGTCCAAGAAAACCCCAATG 
AGTGGTCCTGTCGCCCAACTTAAGGAGCACGTTCGCACCTTCAGGTTCTTTTGGGGTTAC 

4160 ECORI, 

GlyPheSerTyrAspThrArgCysPheAspSerThrValThrGluSerAspIleArgThr 
4202 GGGTTCTCGTATGATACCCGCTGCTTTGACTCCACAGTCACTGAGAGCGACATCCGTACG 
CCCAAGAGCATACTATGGGCGACGAAACTGAGGTGTCAGTGACTCTCGCTGTAGGCATGC 



96/100 



WO 01/38360 



PCT/US00/32326 



FIGURE 22 « PagelB 



4229 DRD1, 4236 ALWN1, 

GluGluAlalleTyrGlnCysCysAspLeuAspProGlnAlaArgValAlalleLysSer 
4262 GAGGAGGCAATCTACCAATGTTGTGACCTCGACCCCCAAGCCCGCGTGGCCATCAAGTCC 
■•■ ' CTCCTCCGTTAGATGGTTACAACACTGGAGCTGGGGGTTCGGGCGCACCGGTAGTTCAGG 

A A 

4301 BGLI, 4308 BALI , 

LeuThrGluArgLeuTyrValGlyGlyProLeuThrAsnSerArgGlyGluAsnCysGly 
4 322 CTCACCGAGAGGCTTTATGTTGGGGGCCCTCTTACCAATTCAAGGGGGGAGAACTGCGGC 
GAGTGGCTCTCCGAAATACAACCCCCGGGAGAATGGTTAAGTTCCCCCCTCTTGACGCCG 

4 34 5 APAI, 

TyrArgArgCysArgAlaSerGlyValLeuThrThrSerCysGlyAsnThrLeuThrCys 
4 382 TATCGCAGGTGCCGCGCGAGCGGCGTACTGACAACTAGCTGTGGTAACACCCTCACTTGC 
ATAGCGTCCACGGCGCGCTCGCCGCATGACTGTTGATCGACACCATTGTGGGAGTGAACG 

TyrlleLysAlaArgAlaAlaCysArgAlaAlaGlyLeuGlnAspCysThrMetLeuVal 
4 4 42 TACATCAAGGCCCGGGCAGCCTGTCGAGCCGCAGGGCTCCAGGACTGCACCATGCTCGTG 
ATGTAGTTCCGGGCCCGTCGGACAGCTCGGCGTCCCGAGGTCCTGACGTGGTAGGAGCAC 

4 4 52 SMAI XMAI, 

CysGlyAspAspLeuValVallleCysGluSerAlaGlyValGlnGluAspAlaAlaSer 
4 502 TGTGGCGACGACTTAGTCGTTATCTGTGAAAGCGCGGGGGTCCAGGAGGACGCGGCGAGC 
ACACCGCTGCTGAATCAGCAATAGACACTTTCGCGCCCCCAGGTCCTCCTGCGCCGCTCG 

4508 DRD1 , 4511 TTH3I , 

LeuArgAlaPheThrGluAlaMetThrArgTyrSerAlaProProGlyAspProProGln 
4 562 CTGAGAGCCTTCACGGAGGCTATGACCAGGTACTCCGCCCCCCCTGGGGACCCCCCACAA 
GACTCTCGGAAGTGCCTCCGATACTGGTCCATGAGGCGGGGGGGACCCCTGGGGGGTGTT 

ProGluTyrAspLeuGluLeuIleThrSerCysSerSerAsnVaiSerValAlaHisAsp 
4 622 CCAGAATACGACTTGGAGCTCATAACATCATGCTCCTCCAACGTGTCAGTCGCCCACGAC 
GGTCTTATGCTGAACCTCGAGTATTGTAGTACGAGGAGGTTGCACAGTCAGCGGGTGCTG 

4 637 SACI, 

GlyAlaGlyLysArgValTyrTyrLeuThrArgAspProThrThrProLeuAlaArgAla 
4 682 GGCGCTGGAAAGAGGGTCTACTACCTCACCCGTGACCCTACAACCCCCCTCGCGAGAGCT 
CCGCGACCTTTCTCCCAGATGATGGAGTGGGCACTGGGATGTTGGGGGGAGCGCTCTCGA 

4731 NRUI, 

AlaTrpGluThrAlaArgHisThrProValAsnSerTrpLeuGlyAsnllelleMetPhe 
474 2 GCGTGGGAGAC AGCAAGACACACTCCAGTCAATTCCTGGCTAGGCAACATAATCATGTTT 
CGCACCCTCTGTCGTTCTGTGTGAGGTCAGTTAAGGACCGATCCGTTGTATTAGTACAAA 

AlaProThrLeuTrpAlaArgMetlleLeuMetThrHisPhePheSerValLeuIleAla 
4 802 GCCCCCACACTGTGGGCGAGGATGATACTGATGACCCATTTCTTTAGCGTCCTTATAGCC 
CGGGGGTGTGACACCCGCTCCTACTATGACTACTGGGTAAAGAAATCGCAGGAATATCGG 

4806 PFLM1, 4807 DRA3, 

ArgAspGlnLeuGluGlnAlaLeuAspCysGluIleTyrGlyAlaCysTyrSerlleGlu 
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4 862 AGGGACCAGCTTGAACAGGCCCTCGATTGCGAGATCTACGGGGCCTGCTACf CCATAGAA 
TCCCTGGTCGAACTTGTCCGGGAGCTAACGCTCTAGATGCCCCGGACGATGAGGTATCTT 

4893 BGL2, 

ProLeuAspLeuProProIlelleGlnArgLeuHisGlyLeuSerAlaPheSerLeuHis 
4 92 2 CCACTGGATCTACCTCCAATCATTCAAAGACTCCATGGCCTCAGCGCATTTTCACTCCAC 

GGTGACCTAGATGGAGGTTAGTAAGTTTCTGAGGTACCGGAGTCGCGTAAAAGTGAGGTG 

a 

4 954 NCOI, 

SerTyrSerProGlyGluIleAsnArgValAlaAlaCysLeuArgLysLeuGlyValPro 
4 982 AGTTACTCTCCAGGTGAAATCAATAGGGTGGCCGCATGCCTCAGAAAACTTGGGGTACCG 
TCAATGAGAGGTCCACTTTAGTTATCCCACCGGCGTACGGAGTCTTTTGAACCCCATGGC 

A A 

5015 SPHI, 5035 KPNI, 

ProLeuArgAlaTrpArgHisArgAlaArgSerValArgAlaArgLeuLeuAlaArgGly 
504 2 CCCTTGCGAGCTTGGAGACACCGGGCCCGGAGCGTCCGCGCTAGGCTTCTGGCCAGAGG A 
GGGAACGCTCGAACCTCTGTGGCCCGGGCCTCGCAGGCGCGATCCGAAGACCGGTCTCCT 

f\ A 

5064 APAI, 5091 BALI, 

GlyArgAlaAlalleCysGlyLysTyrLeuPheAsnTrpAlaValArgThrLysLeuLys 
5102 GGCAGGGCTGCCATATGTGGCAAGTACCTCTTCAACTGGGCAGTAAGAACAAAGCTCAAA 
CCGTCCCGACGGTATACACCGTTCATGGAGAAGTTGACCCGTCATTCTTGTTTCGAGTTT 
a 

5113 NDEI, 

LeuThrProIleAlaAlaAlaGlyGlnLeuAspLeuSerGlyTrpPheThrAlaGlyTyr 
5162 CTCACTCCAATAGCGGCCGCTGGCCAGCTGGACTTGTCCGGCTGGTTCACGGCTGGCTAC 
GAGTGAGGTTATCGCCGGCGACCGGTCGACCTGAACAGGCCGACCAAGTGCCGACCGATG 

A A A A. 

5174 NOTI, 5175 EAG1 XMA3, 5182 BALI , 5186 PVU2 , 

SerGlyGlyAspIleTyrHisSerValSerHisAlaArgProArgTrpIleTrpPheCys 
5222 AGCGGGGGAGACATTTATCACAGCGTGTCTCATGCCCGGCCCCGCTGGATCTGGTTTTGC 
TCGCCCCCTCTGTAAATAGTGTCGCACAGAGTACGGGCCGGGGCGACCTAGACCAAAACG 

5240 DRA3, 

LeuLeuLeuLeuAlaAlaGlyValGlylleTyrLeuLeuProAsnArgMetSerThrAsn 
5282 CTACTCCTGCTTGCTGCAGGGGTAGGCATCTACCTCCTCCCCAACCGAATGAGCACGAAT 
GATGAGGACGAACGACGTCCCCATCCGTAGATGGAGGAGGGGTTGGCTTACTCGTGCTTA 

A 

5295 PSTI, 

ProLysProGlnArgLysThrLysArgAsnThrAsnArgArgProGlnAspValLysPhe 
5342 CCTAAACCTCAAAGAAAGACCAAACGTAACACCAACCGGCGGCCGCAGGACGTCAAGTTC 
GGATTTGGAGTTTCTTTCTGGTTTGCATTGTGGTTGGCCGCCGGCGTCCTGCAGTTCAAG 

A A A A 

5380 NOTI , 5381 EAG1 XMA3, 5390 AAT2, 5401 SMAI XMAI, 

ProGlyGlyGlyGlnlleValGlyGlyValTyrLeuLeuProArgArgGlyProArgLeu 
5402 CCGGGTGGCGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGGCCCTAGATTG 
GGCCCACCGCCAGTCTAGCAACCACCTCAAATGAACAACGGCGCGTCCCCGGGATCTAAC 
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54 4 9 APAI, 

GlyValArgAlaThrArgLysThrSerGluArgSerGlnProArgGlyArgArgGlnPro 
54 62 GGTGTGCGCGCGACGAGAAAGACTTCCGAGCGGTCGCAACCTCGAGGTAGACGTCAGCCT 
1 , .CCACACGCGCGCTGCTCTTTCTGAAGGCTCGCCAGCGTTGGAGCTCCATCTGCAGTCGGA 

A A A A 

5467 BSSH2, 5478 XMNI, 5502 XHOI, 5511 AAT2, 

IleProLysAlaArgArgProGluGlyArgThrTrpAlaGlnProGlyTyrProTrpPro 
5522 ATCCCCAAGGCTCGTCGGCCCGAGGGCAGGACCTGGGCTCAGCCCGGGTACCCTTGGCCC 
TAGGGGTTCCGAGCAGCCGGGCTCCCGTCCTGGACCCGAGTCGGGCCCATGGGAACCGGG 

A A A A 

5548 ALWN1, 5558 ESP1, 5564 SMAI XMAI, 5568 KPNI, 

LeuTyrGlyAsnGluGlyCysGlyTrpAlaGlyTrpLeuLeuSerProArgGlySerArg 
5582 CTCTATGGCAATGAGGGCTGCGGGTGGGCGGGATGGCTCCTGTCTCCCCGTGGCTCTCGG 
GAGATACCGTTACTCCCGACGCCCACCCGCCCTACCGAGGACAGAGGGGCACCGAGAGCC 

ProSerTrpGlyProThrAspProArgArgArgSerArgAsnLeuGlyLysVallleAsp 
5642 CCTAGCTGGGGCCCCACAGACCCCCGGCGTAGGTCGCGCAATTTGGGTAAGGTCATCGAT 
GGATCGACCCCGGGGTGTCTGGGGGCCGCATCCAGCGCGTTAAACCCATTCCAGTAGCTA 

A /S 

5650 APAI, 5696 CLAI, 

ThrLeuThrCysGlyPheAlaAspLeuMetGlyTyrlleProLeuValGlyAlaProLeu 
5702 ACCCTTACGTGCGGCTTCGCCGACCTCATGGGGT ACATACCGCTCGTCGGCGCCCCTCTT 
TGGGAATGCACGCCGAAGCGGCTGGAGTACCCCATGTATGGCGAGCAGCCGCGGGGAGAA 

A A 

5724 HGIE2, 5750 KAS1 NARI, 5756 ECON1, 

GlyGlyAlaAlaArgAlaOC AM 
57 62 GGAGGCGCTGCCAGGGCCTAATAGTCGAC 
CCTCCGCGACGGTCCCGGATTATCAGCTG 

5785 SALI, 
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<160> 19 
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<213> Artificial Sequence 

<220> 
<221> CDS 

<222> (1990) . . (7302) 
<220> 

<223> Description of Artificial Sequence: Hepatitis C pns345 
<400> 1 



cgcgcgtttc 


ggtgatgacg 


gtgaaaacct 


ctgacacatg 


cagctcccgg 


agacggtcac 


60 


agcttgtctg 


taagcggatg 


ccgggagcag 


acaagcccgt 


cagggcgcgt 


cagcgggtgt 


120 


tggcgggtgt 


cggggctggc 


ttaactatgc 


ggcatcagag 


cagattgtac 


tgagagtgca 


180 


ccatatgaag 


ctttttgcaa 


aagcctaggc 


ctccaaaaaa 


gcctcctcac 


tacttctgga 


240 


atagctcaga 


ggccgaggcg 


gcctcggcct 


ctgcataaat 


aaaaaaaatt 


agtcagccat 


300 


ggggcggaga 


atgggcggaa 


ctgggcgggg 


agggaattat 


tggctattgg 


ccattgcata 


360 


cgttgtatct 


atatcataat 


atgtacattt 


atattggctc 


atgtccaata 


tgaccgccat 


420 


gttgacattg 


attattgact 


agttattaat 


agtaatcaat 


tacggggtca 


ttagttcata 


480 


gcccatatat 


ggagttccgc 


gttacataac 


ttacggtaaa 


tggcccgcct 


ggctgaccgc 


540 


ccaacgaccc 


ccgcccattg 


acgtcaataa 


tgacgtatgt 


tcccatagta 


acgccaatag 


600 


ggactttcca 


ttgacgtcaa 


tgggtggagt 


atttacggta 


aactgcccac 


ttggcagtac 


660 


atcaagtgta 


tcatatgcca 


agtccgcccc 


ctattgacgt 


caatgacggt 


aaatggcccg 


720 


cctggcatta 


tgcccagtac 


atgaccttac 


gggactttcc 


tacttggcag 


tacatctacg 


780 


tattagtcat 


cgctattacc 


atggtgatgc 


ggttttggca 


gtacaccaat 


gggcgtggat 


840 


agcggtttga 


ctcacgggga 


tttccaagtc 


tccaccccat 


tgacgtcaat 


gggagtttgt 


900 
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tttggcacca 


aaatcaacgg 


gactttccaa 


aatgtcgtaa 


taaccccgcc 


ccgttgacgc 


960 


aaatgggcgg 


taggcgtgta 


cggtgggagg 


tctatataag 


cagagctcgt 


ttagtgaacc 


1020 


gtcagatcgc 


ctggagacgc 


catccacgct 


gttttgacct 


ccatagaaga 


caccgggacc 


1080 


gatccagcct 


ccgcggccgg 


gaacggtgca 


ttggaacgcg 


gattccccgt 


gccaagagtg 


1140 


acgtaagtac 


cgcctataga 


ctctataggc 


acaccccttt 


ggctcttatg 


catgctatac 


1200 


tgtttttggc 


ttggggccta 


tacacccccg 


ctccttatgc 


tataggtgat 


ggtatagctt 


1260 


agcctatagg 


tgtgggttat 


tgaccattat 


tgaccactcc 


cctattggtg 


acgatacttt 


1320 


ccattactaa 


tccataacat 


ggctctttgc 


cacaactatc 


tctattggct 


atatgccaat 


1380 


actctgtcct 


tcagagactg 


acacggactc 


tgtattttta 


caggatgggg 


tccatttatt 


1440 


atttacaaat 


tcacatatac 


aacaacgccg 


tcccccgtgc 


ccgcagtttt 


tattaaacat 


1500 


agcgtgggat 


ctccgacatc 


tcgggtacgt 


gttccggaca 


tgggctcttc 


tccggtagcg 


1560 


gcggagcttc 


cacatccgag 


ccctggtccc 


atccgtccag 


cggctcatgg 


tcgctcggca 


1620 


gctccttgct 


cctaacagtg 


gaggccagac 


ttaggcacag 


cacaatgccc 


accaccacca 


1680 


gtgtgccgca 


caaggccgtg 


gcggtagggt 


atgtgtctga 


aaatgagctc 


ggagattggg 


1740 


ctcgcacctg 


gacgcagatg 


gaagacttaa 


ggcagcggca 


gaagaagatg 


caggcagctg 


1800 


agttgttgta 


ttctgataag 


agtcagaggt 


aactcccgtt 


gcggtgctgt 


taacggtgga 


1860 


gggcagtgta 


gtctgagcag 


tactcgttgc 


tgccgcgcgc 


gccaccagac 


ataatagctg 


1920 


acagactaac 


agactgttcc 


tttccatggg 


tcttttctgc 


agtcaccgtc 


gtcgacctaa 


1980 



gaattcacc atg get gca tat gca get cag ggc tat aag gtg eta gta etc 2031 
Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu 
1 5 * 10 

aac ccc tct gtt get gca aca ctg ggc ttt ggt get tac atg tec aag 2079 
Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys 
15 20 25 30 

get cat ggg ate gat cct aac ate agg ace ggg gtg aga aca att acc 2127 
Ala His Gly lie Asp Pro Asn lie Arg Thr Gly Val Arg Thr lie Thr 
35 40 45 

act ggc age ccc ate acg tac tec acc tac ggc aag ttc ctt gee gac 2175 
Thr Gly Ser Pro lie Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp 
50 55 60 

ggc ggg tgc teg ggg ggc get tat gac ata ata att tgt gac gag tgc 2223 
Gly Gly Cys Ser Gly Gly Ala Tyr Asp lie lie He Cys Asp Glu Cys 
65 70 75 

cac tec acg gat gee aca tec ate ttg ggc att ggc act gtc ctt gac 2271 
His Ser Thr Asp Ala Thr Ser He Leu Gly He Gly Thr Val Leu Asp 
80 85 90 
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caa gca gag act gcg ggg gcg aga ctg gtt gtg etc gec acc gec acc 2319 
Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr 
95 100 105 110 

cct ccg ggc tec gtc act gtg ccc cat ccc aac ate gag gag gtt get 2367 
Pro Pro Gly Ser Val Thr Val Pro His Pro Asn lie Glu Glu Val Ala 
115 120 125 

ctg tec acc acc gga gag ate cct ttt tac ggc aag get ate ccc etc 2415 
Leu Ser Thr Thr Gly Glu lie Pro Phe Tyr Gly Lys Ala lie Pro Leu 
130 135 140 

gaa gta ate aag ggg ggg aga cat etc ate ttc tgt cat tea aag aag 2463 
Glu Val lie Lys Gly Gly Arg His Leu lie Phe Cys His Ser Lys Lys 
145 150 155 

aag tgc gac gaa etc gee gca aag ctg gtc gca ttg ggc ate aat gec 2511 
Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly lie Asn Ala 
160 165 170 

gtg gee tac tac cgc ggt ctt gac gtg tec gtc ate ccg acc age ggc 2559 
Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val lie Pro Thr Ser Gly 
175 180 185 190 

gat gtt gtc gtc gtg gca acc gat gee etc atg acc ggc tat acc ggc 2607 
Asp Val Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly 
195 200 ' 205 

gac ttc gac teg gtg ata gac tgc aat acg tgt gtc acc cag aca gtc 2655 
Asp Phe Asp Ser Val He Asp Cys Asn Thr Cys Val Thr Gin Thr Val 
210 215 220 

gat ttc age ctt gac cct acc ttc acc att gag aca ate acg etc ccc 2703 
Asp Phe Ser Leu Asp Pro Thr Phe Thr He Glu Thr He Thr Leu Pro 
225 230 235 

caa gat get gtc tec cgc act caa cgt egg ggc agg act ggc agg ggg 2751 
Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly 
240 245 250 

aag cca ggc ate tac aga ttt gtg gca ccg ggg gag cgc ccc tec ggc 2799 
.Lys Pro Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly 
255 260 265 270 

atg ttc gac teg tec gtc etc tgt gag tgc tat gac gca ggc tgt get 2847 
Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala 
275 280 285 

tgg tat gag etc acg ccc gee gag act aca gtt agg eta cga gcg tac 2 895 
Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr 
290 295 300 

atg aac acc ccg ggg ctt ccc gtg tgc cag gac cat ctt gaa ttt tgg 2943 
Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp 
305 310 315 

gag ggc gtc ttt aca ggc etc act cat ata gat gee cac ttt eta tec 2991 
Glu Gly Val Phe Thr Gly Leu Thr His He Asp Ala His Phe Leu Ser 
320 325 330 
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cag aca aag cag agt ggg gag aac ctt cct tac ctg gta gcg tac caa 3039 
Gin Thr Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin 
335 340 345 350 

gcc acc gtg tgc get agg get caa gec cct ccc cca teg tgg gac cag 3087 
Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin 
355 360 ** 365 

atg tgg aag tgt ttg att cgc etc aag ccc acc etc cat ggg cca aca 3135 
Met Trp Lys Cys Leu lie Arg Leu Lys Pro Thr Leu His Gly Pro Thr 
370 375 380 

ccc ctg eta tac aga ctg ggc get gtt cag aat gaa ate acc ctg acg 3183 
Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu lie Thr Leu Thr 
385 390 395 

cac cca gtc acc aaa tac ate atg aca tgc atg teg gcc gac ctg gag 3231 
His Pro Val Thr Lys Tyr He Met Thr Cys Met Ser Ala Asp Leu Glu 
400 405 410 

gtc gtc acg age acc tgg gtg etc gtt ggc ggc gtc ctg get get ttg 3279 
Val Val Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu 
415 420 425 430 

gcc gcg tat tgc ctg tea aca ggc tgc gtg gtc ata gtg ggc agg gtc 3327 
Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val He Val Gly Arg Val 
435 440 445 

gtc ttg tec ggg aag ccg gca ate ata cct gac agg gaa gtc etc tac 3375 
Val Leu Ser Gly Lys Pro Ala He He Pro Asp Arg Glu Val Leu Tyr 
450 455 460 

cga gag ttc gat gag atg gaa gag tgc tct cag cac tta ccg tac ate 3423 
Arg Glu Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr He 
465 470 475 

gag caa ggg atg atg etc gcc gag cag ttc aag cag aag gcc etc ggc 3471 
Glu Gin Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly 
480 485 490 

etc ctg cag acc gcg tec cgt cag gca gag gtt ate gcc cct get gtc 3519 
Leu Leu Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala Pro Ala Val 
495 500 505 510 

cag acc aac tgg caa aaa etc gag acc ttc tgg gcg aag cat atg tgg 3567 
Gin Thr Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp 
515 520 525 

aac ttc ate agt ggg ata caa tac ttg gcg ggc ttg tea acg ctg cct 3615 
Asn Phe lie Ser Gly He Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro 
530 535 540 

ggt aac ccc gcc att get tea ttg atg get ttt aca get get gtc acc 3663 
Gly Asn Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr 
545 550 555 

age cca eta acc act age caa acc etc etc ttc aac ata ttg ggg ggg 3711 
Ser Pro Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He Leu Gly Gly 
560 565 570 
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tgg gtg get gec cag etc gec gee ccc ggt gec get act gee ttt gtg 375 9 

Trp Val Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val 
575 580 585 590 

ggc get ggc tta get ggc gee gee ate ggc agt gtt gga ctg ggg aag 3 807 

Gly Ala Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly Leu Gly Lys 

595 600 605 

gtc etc ata gac ate ctt gca ggg tat ggc gcg ggc gtg gcg gga get 3855 

Val Leu He Asp He Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala 
610 615 620 

ctt gtg gca ttc aag ate atg age ggt gag gtc ccc tec acg gag gac 3903 

Leu Val Ala Phe Lys He Met Ser Gly Glu Val Pro Ser Thr Glu Asp 
625 630 635 

ctg gtc aat eta ctg ccc gee ate etc teg ccc gga gee etc gta gtc 3951 

Leu Val Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala Leu Val Val 
640 645 650 

99c gtg gtc tgt gca gca ata ctg cgc egg cac gtt ggc ccg ggc gag 3999 

Gly Val Val Cys Ala Ala He Leu Arg Arg His Val Gly Pro Gly Glu 
655 660 665 670 

ggg gca gtg cag tgg atg aac egg ctg ata gee ttc gee tec egg ggg 4 04 7 

Gly Ala Val Gin Trp Met Asn Arg Leu He Ala Phe Ala Ser Arg Gly 

675 680 685 

aac cat gtt tec ccc acg cac tac gtg ccg gag age gat gca get gee 4095 

Asn His Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala 
690 695 700 

cgc gtc act gee ata etc age age etc act gta acc cag etc ctg agg 414 3 

Arg Val Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg 
705 710 715 

cga ctg cac cag tgg ata age teg gag tgt acc act cca tgc tec ggt 4191 

Arg Leu His Gin Trp He Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly 
720 725 730 

tec tgg eta agg gac ate tgg gac tgg ata tgc gag gtg ttg age gac 423 9 

Ser Trp Leu; Arg Asp He Trp Asp Trp He Cys Glu Val Leu Ser Asp 
735 740 745 750 

ttt aag acc tgg eta aaa get aag etc atg cca cag ctg cct ggg ate 4287 

Phe Lys Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly He 

755 760 765 

ccc ttt gtg tec tgc cag cgc ggg tat aag ggg gtc tgg cga ggg gac 4335 

Pro Phe Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp 
770 775 " 780 

ggc ate atg cac act cgc tgc cac tgt gga get gag ate act gga cat 43 83 

Gly He Met His Thr Arg Cys His Cys Gly Ala Glu He Thr Gly His 
785 790 795 

gtc aaa aac ggg acg atg agg ate gtc ggt cct agg acc tgc agg aac 4431 

Val Lys Asn Gly Thr Met Arg He Val Gly Pro Arg Thr Cys Arg Asn 
800 805 810 
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atg tgg agt ggg acc ttc ccc att aat gcc tac acc acg ggc ccc tgt 4479 
Met Trp Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr Gly Pro Cys 
815 820 825 830 

acc ccc ctt cct gcg ccg aac tac acg ttc gcg eta tgg agg gtg tct 4527 
Thr Pro Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser 
835 840 845 

gca gag gaa tac gtg gag ata agg cag gtg ggg gac ttc cac tac gtg 4575 
Ala Glu Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe His Tyr Val 
850 855 860 

acg ggt atg act act gac aat ctt aaa tgc ccg tgc cag gtc cca teg 4623 
Thr Gly Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser 
865 870 875 

ccc gaa ttt ttc aca gaa ttg gac ggg gtg cgc eta cat agg ttt gcg 4671 
Pro Glu Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala 
880 885 890 

ccc ccc tgc aag ccc ttg ctg egg gag gag gta tea ttc aga gta gga 4719 
Pro Pro Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly 
895 900 905 910 

etc cac gaa tac ccg gta ggg teg caa tta cct tgc gag ccc gaa ccg 4767 
Leu His Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro 
915 920 925 

gac gtg gcc gtg ttg acg tec atg etc act gat ccc tec cat ata aca 4815 
Asp Val Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His He Thr 
930 935 940 

gca gag gcg gcc ggg cga agg ttg gcg agg gga tea ccc ccc tct gtg 4 863 
Ala Glu Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val 
945 950 955 

gcc age tec teg get age cag eta tec get cea tct etc aag gca act 4 911 
Ala Ser Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr 
960 965 970 

tgc acc get aac cat gac tec cct gat get gag etc ata gag gcc aac 4 959 
Cys Thr Ala Asn His Asp Ser Pro Asp Ala Glu : Leu He Glu Ala Asn 
975 980 985 990 

etc eta tgg agg cag gag atg ggc ggc aac ate acc agg gtt gag tea 5007 
Leu Leu Trp Arg Gin Glu Met Gly Gly Asn lie Thr Arg Val Glu Ser 
995 1000 1005 

gaa aac aaa gtg gtg att ctg gac tec ttc gat ccg ctt gtg gcg gag 5055 
Glu Asn Lys Val Val lie Leu Asp Ser Phe Asp Pro Leu Val Ala Glu 
1010 1015 1020 

gag gac gag egg gag ate tec gta ccc gca gaa ate ctg egg aag tct 5103 
Glu Asp Glu Arg Glu lie Ser Val Pro Ala Glu lie Leu Arg Lys Ser 
1025 1030 1035 

egg aga ttc gcc cag gcc ctg ccc gtt tgg gcg egg ccg gac tat aac 5151 
Arg Arg Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn 
1040 1045 1050 
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ccc ccg eta gtg gag acg tgg aaa aag ccc gac tac gaa cca cct gtg 5199 
Pro Pro Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val 
1055 1060 1065 1070 

gtc cat ggc tgc ccg ctt cca cct cca aag tec cct cct gtg cct ccg 5247 
Val His Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro 
1075 1080 1085 

cct egg aag aag egg acg gtg gtc etc act gaa tea ace eta tct act 5295 
Pro Arg Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr 
1090 1095 1100 

gee ttg gee gag etc gee acc aga age ttt ggc age tec tea act tec 5343 
Ala Leu Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser 
1105 1110 1115 

ggc att acg ggc gac aat acg aca aca tec tct gag ccc gec cct tct 5391 
Gly lie Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser 
1120 1125 1130 

ggc tgc ccc ccc gac tec gac get gag tec tat tec tec atg ccc ccc 5439 
Gly Cys Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro 
1135 1140 1145 1150 

ctg gag ggg gag cct ggg gat ccg gat ctt age gac ggg tea tgg tea 5487 
Leu Glu Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser 
1155 1160 1165 

acg gtc agt agt gag gee aac gcg gag gat gtc gtg tgc tgc tea atg 5535 
Thr Val Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met 
1170 1175 1180 

tct tac tct tgg aca ggc gca etc gtc acc ccg tgc gee gcg gaa gaa 5583 
Ser Tyr Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu 
1185 1190 1195 

cag aaa ctg ccc ate aat gca eta age aac teg ttg eta cgt cac cac 5631 
Gin Lys Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu Arg His His 
1200 1205 1210 

aat ttg gtg tat tec acc acc tea cgc agt get tgc caa agg cag aag 5679 
Asn Leu Val Tyr Ser Thr Thr Ser Arg -Ser Ala Cys Gin Arg Gin Lys 
1215 1220 1225 1230 

aaa gtc aca ttt gac aga ctg caa gtt ctg gac age cat tac cag gac 5727 
Lys Val Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp 
1235 1240 1245 

gta etc aag gag gtt aaa gca gcg gcg tea aaa gtg aag get aac ttg 5775 
Val Leu Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu 
1250 1255 1260 

eta tec gta gag gaa get tgc age ctg acg ccc cca cac tea gee aaa 5823 
Leu Ser val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys 
1265 1270 1275 

tec aag ttt ggt tat ggg gca aaa gac gtc cgt tgc cat gee aga aag 5871 
Ser Lys Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys 
1280 1285 1290 
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gcc gta acc cac ate aac tec gtg tgg aaa gac ctt ctg gaa gac aat 
Ala Val Thr His lie Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn 
1295 1300 1305 1310 



5919 



gta aca cca ata gac act acc ate atg get aag aac gag gtt ttc tgc 
Val Thr Pro lie Asp Thr Thr lie Met Ala Lys Asn Glu Val Phe Cys 
1315 1320 1325 



5967 



gtt cag cct gag aag ggg ggt cgt aag cca get cgt etc ate gtg ttc 
Val Gin Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu lie Val Phe 
1330 1335 " 1340 



6015 



ccc gat ctg ggc gtg cgc gtg tgc gaa aag atg get ttg tac gac gtg 
Pro Asp Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val 
1345 1350 1355 



6063 



gtt aca aag etc ccc ttg gcc gtg atg gga age tec tac gga ttc caa 
Val Thr Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin 
1360 1365 1370 



6111 



tac tea cca gga cag egg gtt gaa ttc etc gtg caa gcg tgg aag tec 
Tyr Ser Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser 
1375 1380 1385 1390 



6159 



aag aaa acc cca atg ggg ttc teg tat gat acc cgc tgc ttt gac tec 
Lys Lys Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser 
1395 1400 1405 



6207 



aca gtc act gag age gac ate cgt acg gag gag gca ate tac caa tgt 
Thr Val Thr Glu Ser Asp lie Arg Thr Glu Glu Ala lie Tyr Gin Cys 
1410 1415 1420 



6255 



tgt gac etc gac ccc caa gcc cgc gtg gcc ate aag tec etc acc gag 
Cys Asp Leu Asp Pro Gin Ala Arg Val Ala He Lys Ser Leu Thr Glu 
1425 1430 1435 



6303 



agg ctt tat gtt ggg ggc cct ctt acc aat tea agg ggg gag aac tgc 
Arg Leu Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys 
1440 1445 1450 



6351 



ggc tat cgc agg tgc cgc gcg age ggc gta ctg aca act age tgt ggt 
Gly Tyr Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly 
1455 1460 1465 1470 



6399 



aac acc etc act tgc tac ate aag gcc egg gca gcc tgt cga gcc gca 
Asn Thr Leu Thr Cys Tyr He Lys Ala Arg Ala Ala Cys Arg Ala Ala 
1475 1480 1485 



6447 



ggg etc cag gac tgc acc atg etc gtg tgt ggc gac gac tta gtc gtt 
Gly Leu Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val 
1490 1495 1500 



6495 



ate tgt gaa age gcg ggg gtc cag gag gac gcg gcg age ctg aga gcc 
He Cys Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala 
1505 1510 1515 



6543 



ttc acg gag get atg acc agg tac tec gcc ccc cct ggg gac ccc cca 
Phe Thr Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro 
1520 1525 1530 



6591 
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caa cca gaa tac gac ttg gag etc ata aca tea tgc tec tec aac gtg 
Gin Pro Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser Ser Asn Val 
1535 1540 1545 1550 



6639 



tea gtc gee cac gac ggc get gga aag agg gtc tac tac etc ace cgt 
Ser Val Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg 
1555 1560 1565 



6687 



gac cct aca acc ccc etc gcg aga get gcg tgg gag aca gca aga cac 
Asp Pro Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His 
1570 1575 1580 



6735 



act cca gtc aat tec tgg eta ggc aac ata ate atg ttt gee ccc aca 
Thr Pro Val Asn Ser Trp Leu Gly Asn lie lie Met Phe Ala Pro Thr 
1585 1590 1595 



6783 



ctg tgg gcg agg atg ata ctg atg acc cat ttc ttt age gtc ctt ata 
Leu Trp Ala Arg Met He Leu Met Thr His Phe Phe Ser Val Leu He 
1600 1605 1610 



6831 



gec agg gac cag ctt gaa cag gee etc gat tgc gag ate tac ggg gee 
Ala Arg Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu He Tyr Gly Ala 
1615 1620 1625 1630 



6879 



tgc tac tec ata gaa cca ctg gat eta cct cca ate att caa aga etc 
Cys Tyr Ser He Glu Pro Leu Asp Leu Pro Pro He He Gin Arg Leu 
1635 1640 1645 



6927 



cat ggc etc age gca ttt tea etc cac agt tac tct cca ggt gaa ate 
His Gly Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu He 
1650 1655 1660 



6975 



aat agg gtg gee gca tgc etc aga aaa ctt ggg gta ccg ccc ttg cga 
Asn Arg Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg 
1665 1670 1675 



7023 



get tgg aga cac egg gec egg age gtc cgc get agg ctt ctg gee aga 
Ala Trp Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg 
1680 1685 1690 



7071 



gga ggc agg get gec ata tgt ggc aag tac etc ttc aac tgg gca gta 
Gly Gly Arg Ala Ala He Cys Gly : Lys Tyr Leu Phe Asn Trp Ala Val 
1695 1700 1705 1710 



7119 



aga aca aag etc aaa etc act cca ata gcg gec get ggc cag ctg gac 
Arg Thr Lys Leu Lys Leu Thr Pro He Ala Ala Ala Gly Gin Leu Asp 
1715 1720 1725 



7167 



ttg tec ggc tgg ttc acg get ggc tac age ggg gga gac att tat cac 
Leu Ser Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp He Tyr His 
1730 1735 1740 



7215 



age gtg tct cat gec egg ccc cgc tgg ate tgg ttt tgc eta etc ctg 
Ser Val Ser His Ala Arg Pro Arg Trp He Trp Phe Cys Leu Leu Leu 
1745 1750 1755 



7263 



ctt get gca ggg gta ggc ate tac etc etc ccc aac cga tgaaggttgg 
Leu Ala Ala Gly Val Gly He Tyr Leu Leu Pro Asn Arg 
1760 1765 1770 
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ggtaaacact 


ccggcctaaa 


aaaaaaaaaa 


aatctagaaa 


ggcgcgccaa 


gatatcaagg 


7372 


atccactacg 


cgttagagct 


cgctgatcag 


cctcgactgt 


gccttctagt 


tgccagccat 


7432 


ctgttgtttg 


cccctccccc 


gtgccttcct 


tgaccctgga 


aggtgccact 


cccactgtcc 


7492 


tttcctaata 


aaatgaggaa 


attgcatcgc 


attgtctgag 


taggtgtcat 


tctattctgg 


7552 


ggggtggggt 


ggggcaggac 


agcaaggggg 


aggattggga 


agacaatagc 


aggcatgctg 


7612 


gggagctctt 


ccgcttcctc 


gctcactgac 


tcgctgcgct 


cggtcgttcg 


gctgcggcga 


7672 


gcggtatcag 


ctcactcaaa 


ggcggtaata 


cggttatcca 


cagaatcagg 


ggataacgca 


7732 


ggaaagaaca 


tgtgagcaaa 


aggccagcaa 


aaggccagga 


accgtaaaaa 


ggccgcgttg 


7792 


ctggcgtttt 


tccataggct 


ccgcccccct 


gacgagcatc 


acaaaaatcg 


acgctcaagt 


7852 


cagaggtggc 


gaaacccgac 


aggactataa 


agataccagg 


cgtttccccc 


tggaagctcc 


7912 


ctcgtgcgct 


ctcctgttcc 


gaccctgccg 


cttaccggat 


acctgtccgc 


ctttctccct 


7972 


tcgggaagcg 


tggcgctttc 


tcaatgctca 


cgctgtaggt 


atctcagttc 


ggtgtaggtc 


8032 


gttcgctcca 


agctgggctg 


tgtgcacgaa 


ccccccgttc 


agcccgaccg 


ctgcgcctta 


8092 


tccggtaact 


atcgtcttga 


gtccaacccg 


gtaagacacg 


acttatcgcc 


actggcagca 


8152 


gccactggta 


acaggattag 


cagagcgagg 


tatgtaggcg 


gtgctacaga 


gttcttgaag 


8212 


tggtggccta 


actacggcta 


cactagaagg 


acagtatttg 


gtatctgcgc 


tctgctgaag 


8272 


ccagttacct 


tcggaaaaag 


agttggtagc 


tcttgatccg 


gcaaacaaac 


caccgctggt 


8332 


agcggtggtt 


tttttgtttg 


caagcagcag 


attacgcgca 


gaaaaaaagg 


atctcaagaa 


8392 


gatcctttga 


tcttttctac 


ggggtctgac 


gctcagtgga 


acgaaaactc 


acgttaaggg 


8452 


attttggtca 


tgagattatc 


aaaaaggatc 


ttcacctaga 


tccttttaaa 


ttaaaaatga 


8512 


agttttaaat 


caatctaaag 


tatatatgag 


taaacttggt 


ctgacagtta 


ccaatgctta 


8572 


atcagtgagg 


cacctatctc 


agcgatctgt 


ctatttcgtt 


catccatagt 


tgcctgactc 


8632 


cccgtcgtgt 


agataactac 


gatacgggag 


ggcttaccat 


ctggccccag 


tgctgcaatg 


8692 


ataccgcgag 


acccacgctc 


accggctcca 


gatttatcag 


caataaacca 


gccagccgga 


8752 


agggccgagc 


gcagaagtgg 


tcctgcaact 


ttatccgcct 


ccatccagtc 


tattaattgt 


8812 


tgccgggaag 


ctagagtaag 


tagttcgcca 


gttaatagtt 


tgcgcaacgt 


tgttgccatt 


8872 


gctacaggca 


tcgtggtgtc 


acgctcgtcg 


tttggtatgg 


cttcattcag 


ctccggttcc 


8932 


caacgatcaa 


ggcgagttac 


atgatccccc 


atgttgtgca 


aaaaagcggt 


tagctccttc 


8992 


ggtcctccga 


tcgttgtcag 


aagtaagttg 


gccgcagtgt 


tatcactcat 


ggttatggca 


9052 


gcactgcata 


attctcttac 


tgtcatgcca 


tccgtaagat 


gcttttctgt 


gactggtgag 


9112 
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tactcaacca agtcattctg agaatagtgt atgcggcgac cgagttgctc ttgcccggcg 9172 
tcaatacggg ataataccgc gccacatagc agaactttaa aagtgctcat cattggaaaa 9232 
cgttcttcgg ggcgaaaact ctcaaggatc ttaccgctgt tgagatccag ttcgatgtaa 9292 
cccactcgtg cacccaactg atcttcagca tcttttactt tcaccagcgt ttctgggtga 9352 
gcaaaaacag gaaggcaaaa tgccgcaaaa aagggaataa gggcgacacg gaaatgttga 9412 
atactcatac tcttcctttt tcaatattat tgaagcattt atcagggtta ttgtctcatg 9472 
agcggataca tatttgaatg tatttagaaa aataaacaaa taggggttcc gcgcacattt 9532 
ccccgaaaag tgccacctga cgtctaagaa accattatta tcatgacatt aacctataaa 9592 
aataggcgta tcacgaggcc ctttcgtc 9620 



<210> 2 
<211> 1771 
<212> PRT 

<213> Hepatitis C virus 
<220> 

<223> Description of Artificial Sequence: Hepatitis C pns345 
<400> 2 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu Asn Pro 
15 10 15 

Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys Ala His 
20 25 30 

Gly lie Asp Pro Asn lie Arg Thr Gly Val Arg Thr lie Thr Thr Gly 
35 40 45 

Ser Pro lie Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp Gly Gly 
50 55 60 

Cys Ser Gly Gly Ala Tyr Asp lie lie lie Cys Asp Glu Cys His Ser 
65 70 75 80 

Thr Asp Ala Thr Ser lie Leu Gly lie Gly Thr Val Leu Asp Gin Ala 
85 90 95 

Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr Pro Pro 
100 105 110 

Gly Ser Val Thr Val Pro His Pro Asn He Glu Glu Val Ala Leu Ser 
115 120 125 

Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala He Pro Leu Glu Val 
130 135 140 

He Lys Gly Gly Arg His Leu He Phe Cys His Ser Lys Lys Lys Cys 
145 150 155 160 

Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly He Asn Ala Val Ala 
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165 



170 



175 



Tyr Tyr Arg Gly Leu Asp Val Ser Val lie Pro Thr Ser Gly Asp Val 
180 185 190 

Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe 
195 200 205 

Asp Ser Val lie Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe 
210 215 220 

Ser Leu Asp Pro Thr Phe Thr lie Glu Thr lie Thr Leu Pro Gin Asp 
225 230 235 240 

Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly Lys Pro 
245 250 255 

Gly lie Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly Met Phe 
260 265 270 

Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala Trp Tyr 
275 280 285 

Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr Met Asn 
290 295 300 

Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp Glu Gly 
305 310 315 320 

Val Phe Thr Gly Leu Thr His He Asp Ala His Phe Leu Ser Gin Thr 
325 330 335 

Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin Ala Thr 
340 345 350 

Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin Met Trp 
355 360 365 

Lys Cys Leu He Arg Leu Lys Pro Thr Leu His Gly Pro Thr Pro Leu 
370 375 380 

Leu Tyr Arg. Leu Gly Ala Val Gin Asn Glu He Thr Leu Thr His Pro 
385 390 395 400 

Val Thr Lys Tyr He Met Thr Cys Met Ser Ala Asp Leu Glu Val Val 
405 410 415 

Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu Ala Ala 
420 425 430 

Tyr Cys Leu Ser Thr Gly Cys Val Val lie Val Gly Arg Val Val Leu 
435 440 445 

Ser Gly Lys Pro Ala He He Pro Asp Arg Glu Val Leu Tyr Arg Glu 
450 455 460 



Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr lie Glu Gin 
465 470 475 480 
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Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly Leu Leu 
485 490 495 

Gin Thr Ala Ser Arg Gin Ala Glu Val lie Ala Pro Ala Val Gin Thr 
500 505 510 

Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp Asn Phe 
515 " 520 525 

lie Ser Gly lie Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro Gly Asn 
530 535 540 

Pro Ala lie Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr Ser Pro 
545 550 555 560 

Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn lie Leu Gly Gly Trp Val 
565 570 575 

Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val Gly Ala 
580 585 590 

Gly Leu Ala Gly Ala Ala lie Gly Ser Val Gly Leu Gly Lys Val Leu 
595 600 605 

lie Asp He Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala Leu Val 
610 615 620 

Ala Phe Lys He Met Ser Gly Glu Val Pro Ser Thr Glu Asp Leu Val 
625 630 635 640 

Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala Leu Val Val Gly Val 
645 650 655 

Val Cys Ala Ala He Leu Arg Arg His Val Gly Pro Gly Glu Gly Ala 
660 665 670 

Val Gin Trp Met Asn Arg Leu He Ala Phe Ala Ser Arg Gly Asn His 
675 680 685 

Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala Arg Val 
690 695 700 

Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg Arg Leu 
705 710 715 720 

His Gin Trp He Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly Ser Trp 
725 730 735 

Leu Arg Asp He Trp Asp Trp He Cys Glu Val Leu Ser Asp Phe Lys 
740 745 750 

Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly He Pro Phe 
755 760 765 

Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp Gly He 
770 775 780 



Met His Thr Arg Cys His Cys Gly Ala Glu He Thr Gly His Val Lys 
785 790 795 800 
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Asn Gly Thr Met Arg lie Val Gly Pro Arg Thr Cys Arg Asn Met Trp 
805 810 815 

Ser Gly Thr Phe Pro lie Asn Ala Tyr Thr Thr Gly Pro Cys Thr Pro 
820 825 830 

Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser Ala Glu 
835 840 * 845 

Glu Tyr Val Glu lie Arg Gin Val Gly Asp Phe His Tyr Val Thr Gly 
850 855 860 

Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser Pro Glu 
865 870 875 880 

Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala Pro Pro 
885 890 895 

Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly Leu His 
900 905 910 

Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro Asp Val 
915 920 925 

Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His lie Thr Ala Glu 
930 935 940 

Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val Ala Ser 
945 950 955 960 

Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr Cys Thr 
965 970 975 

Ala Asn His Asp Ser Pro Asp Ala Glu Leu lie Glu Ala Asn Leu Leu 
980 985 990 

Trp Arg Gin Glu Met Gly Gly Asn lie Thr Arg Val Glu Ser Glu Asn 
995 1000 1005 

Lys Val Val lie Leu Asp Ser Phe Asp Pro Leu Val Ala Glu Glu Asp 
1010 1015 1020 

Glu Arg Glu lie Ser Val Pro Ala Glu lie Leu Arg Lys Ser Arg Arg 
025 1030 1035 1040 

Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn Pro Pro 
1045 1050 1055 

Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val Val His 
1060 1065 1070 

Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro Pro Arg 
1075 • 1080 1085 

Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr Ala Leu 
1090 1095 1100 

Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser Gly lie 
105 1110 1115 1120 
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Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser Gly Cys 
1125 1130 1135 

Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro Leu Glu 
1140 1145 1150 

Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser Thr Val 
1155 1160 1165 

Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met Ser Tyr 
1170 1175 * 1180 

Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu Gin Lys 
185 1190 1195 1200 

Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu Arg His His Asn Leu 
1205 1210 1215 

Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys Lys Val 
1220 1225 1230 

Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp Val Leu 
1235 1240 1245 

Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu Leu Ser 
1250 1255 1260 



Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys Ser Lys 
265 1270 1275 1280 

Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys Ala Val 
1285 1290 1295 

Thr His lie Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn Val Thr 
1300 1305 1310 

Pro He Asp Thr Thr He Met Ala Lys Asn Glu Val Phe Cys Val Gin 
1315 1320 1325 

Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu He Val Phe Pro Asp 
1330 1335 1340 

Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val Val Thr 
345 1350 1355 ' 1360 

Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin Tyr Ser 
1365 1370 1375 

Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser Lys Lys 
1380 1385 1390 

Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val 
1395 1400 1405 

Thr Glu Ser Asp He Arg Thr Glu Glu Ala He Tyr Gin Cys Cys Asp 
1410 1415 1420 

Leu Asp Pro Gin Ala Arg Val Ala He Lys Ser Leu Thr Glu Arg Leu 
425 1430 1435 1440 
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Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr 
1445 1450 1455 

Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
1460 1465 1470 

Leu Thr Cys Tyr lie Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
1475 1480 1485 

Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val lie Cys 
1490 1495 1500 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe Thr 
505 1510 1515 1520 

Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro Gin Pro 
1525 1530 1535 

Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser Ser Asn Val Ser Val 
1540 1545 1550 

Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg Asp Pro 
1555 1560 1565 

Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His Thr Pro 
1570 1575 1580 

Val Asn Ser Trp Leu Gly Asn lie He Met Phe Ala Pro Thr Leu Trp 
585 1590 1595 1600 

Ala Arg Met He Leu Met Thr His Phe Phe Ser Val Leu He Ala Arg 
1605 1610 1615 

Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu He Tyr Gly Ala Cys Tyr 
1620 1625 1630 

Ser He Glu Pro Leu Asp Leu Pro Pro He He Gin Arg Leu His Gly 
1635 1640 1645 

Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu He Asn Arg 
1650 1655 1660 

Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg Ala Trp 
665 1670 1675 1680 

Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg Gly Gly 
1685 1690 1695 

Arg Ala Ala He Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val Arg Thr 
1700 1705 1710 

Lys Leu Lys Leu Thr Pro He Ala Ala Ala Gly Gin Leu Asp Leu Ser 
1715 1720 1725 

Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp He Tyr His Ser Val 
1730 1735 1740 

Ser His Ala Arg Pro Arg Trp He Trp Phe Cys Leu Leu Leu Leu Ala 
745 1750 1755 1760 
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Ala Gly Val Gly lie Tyr Leu Leu Pro Asn Arg 
1765 1770 



<210> 3 

<211> 9620 

<212> DNA 

<213> Artificial Sequence 

<220> 
<221> CDS 

<222> (1990) . . (7302) 
<220> 

<223> Description of Artificial Sequence: pDeltaNS3NS5 
<400> 3 



cgcgcgtttc 


ggtgatgacg 


gtgaaaacct 


ctgacacatg 


cagctcccgg 


agacggtcac 


60 


agcttgtctg 


taagcggatg 


ccgggagcag 


acaagcccgt 


cagggcgcgt 


cagcgggtgt 


120 


tggcgggtgt 


cggggctggc 


ttaactatgc 


ggcatcagag 


cagattgtac 


tgagagtgca 


180 


ccatatgaag 


ctttttgcaa 


aagcctaggc 


ctccaaaaaa 


gcctcctcac 


tacttctgga 


240 


atagctcaga 


ggccgaggcg 


gcctcggcct 


ctgcataaat 


aaaaaaaatt 


agtcagccat 


300 


ggggcggaga 


atgggcggaa 


ctgggcgggg 


agggaattat 


tggctattgg 


ccattgcata 


360 


cgttgtatct 


atatcataat 


atgtacattt 


atattggctc 


atgtccaata 


tgaccgccat 


420 


gttgacattg 


attattgact 


agttattaat 


agtaatcaat 


tacggggtca 


ttagttcata 


480 


gcccatatat 


ggagttccgc 


gttacataac 


ttacggtaaa 


tggcccgcct 


ggctgaccgc 


540 


ccaacgaccc 


ccgcccattg 


acgtcaataa 


tgacgtatgt 


tcccatagta 


acgccaatag 


600 


ggactttcca 


ttgacgtcaa 


tgggtggagt 


atttacggta 


aactgcccac 


ttggcagtac 


660 


atcaagtgta 


tcatatgcca 


agtccgcccc 


ctattgacgt 


caatgacggt 


aaatggcccg 


720 


cctggcatta 


tgcccagtac 


atgaccttac 


gggactttcc 


tacttggcag 


tacatctacg 


780 


tattagtcat 


cgctattacc 


atggtgatgc 


ggttttggca 


gtacaccaat 


gggcgtggat 


840 


agcggtttga 


ctcacgggga 


tttccaagtc 


tccaccccat 


tgacgtcaat 


gggagtttgt 


900 


tttggcacca 


aaatcaacgg 


gactttccaa 


aatgtcgtaa 


taaccccgcc 


ccgttgacgc 


960 


aaatgggcgg 


taggcgtgta 


cggtgggagg 


tctatataag 


cagagctcgt 


ttagtgaacc 


1020 


gtcagatcgc 


ctggagacgc 


catccacgct 


gttttgacct 


ccatagaaga 


caccgggacc 


1080 


gatccagcct 


ccgcggccgg 


gaacggtgca 


ttggaacgcg 


gattccccgt 


gccaagagtg 


1140 


acgtaagtac 


cgcctataga 


ctctataggc 


acaccccttt 


ggctcttatg 


catgctatac 


1200 


tgtttttggc 


ttggggccta 


tacacccccg 


ctccttatgc 


tataggtgat 


ggtatagctt 


1260 
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agcctatagg tgtgggttat tgaccattat tgaccactcc cctattggtg acgatacttt 1320 

ccattactaa tccataacat ggctctttgc cacaactatc tctattggct atatgccaat 1380 

actctgtcct tcagagactg acacggactc tgtattttta caggatgggg tccatttatt 1440 

atttacaaat tcacatatac aacaacgccg tcccccgtgc ccgcagtttt tattaaacat 1500 

agcgtgggat ctccgacatc tcgggtacgt gttccggaca tgggctcttc tccggtagcg 1560 

gcggagcttc cacatccgag ccctggtccc atccgtccag cggctcatgg tcgctcggca 162 0 

gctccttgct cctaacagtg gaggccagac ttaggcacag cacaatgccc accaccacca 1680 

gtgtgccgca caaggccgtg gcggtagggt atgtgtctga aaatgagctc ggagattggg 174 0 

ctcgcacctg gacgcagatg gaagacttaa ggcagcggca gaagaagatg caggcagctg 1800 

agttgttgta ttctgataag agtcagaggt aactcccgtt gcggtgctgt taacggtgga 1860 

gggcagtgta gtctgagcag tactcgttgc tgccgcgcgc gccaccagac ataatagctg 1920 

acagactaac agactgttcc tttccatggg tcttttctgc agtcaccgtc gtcgacctaa 1980 

gaattcacc atg get gca tat gca get cag ggc tat aag gtg eta gta etc 2031 
Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu 
15 10 

aac ccc tct gtt get gca aca ctg ggc ttt ggt get tac atg tec aag 2079 
Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys 
15 20 25 30 

get cat ggg ate gat cct aac ate agg ace ggg gtg aga aca att ace 212 7 
Ala His Gly lie Asp Pro Asn lie Arg Thr Gly Val Arg Thr lie Thr 
35 40 45 

act ggc age ccc ate acg tac tec ace tac ggc aag ttc ctt gee gac 2175 
Thr Gly Ser Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp 
50 55 60 

ggc ggg tgc teg ggg ggc get tat gac at a ata att tgt gac gag tgc 2223 
Gly Gly Cys Ser Gly: Gly Ala Tyr Asp He He He Cys Asp Glu Cys 
65 70 75 

cac tec acg gat gee aca tec ate ttg ggc att ggc act gtc ctt gac 2271 
His Ser Thr Asp Ala Thr Ser lie Leu Gly lie Gly Thr Val Leu Asp 
80 85 90 

caa gca gag act gcg ggg gcg aga ctg gtt gtg etc gee ace gee ace 2319 
Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr 
95 100 105 110 

cct ccg ggc tec gtc act gtg ccc cat ccc aac ate gag gag gtt get 2367 
Pro Pro Gly Ser Val Thr Val Pro His Pro Asn lie Glu Glu Val Ala 
115 120 125 

ctg tec ace ace gga gag ate cct ttt tac ggc aag get ate ccc etc 2415 
Leu Ser Thr Thr Gly Glu lie Pro Phe Tyr Gly Lys Ala lie Pro Leu 
130 135 140 
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gaa gta ate aag ggg ggg aga cat etc ate ttc tgt cat tea aag aag 2463 
Glu Val lie Lys Gly Gly Arg His Leu lie Phe Cys His Ser Lys Lys 
145 150 155 

aag tgc gac gaa etc gee gca aag ctg gtc gca ttg ggc ate aat gee 2511 
Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly lie Asn Ala 
160 165 170 

gtg gee tac tac cgc ggt ctt gac gtg tec gtc ate ccg acc age ggc 2559 
Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro Thr Ser Gly 
175 180 185 190 

gat gtt gtc gtc gtg gca acc gat gee etc atg acc ggc tat acc ggc 2607 
Asp Val Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly 
195 200 205 

gac ttc gac teg gtg ata gac tgc aat acg tgt gtc acc cag aca gtc 2655 
Asp Phe Asp Ser Val He Asp Cys Asn Thr Cys Val Thr Gin Thr Val 
210 215 220 

gat ttc age ctt gac cct acc ttc acc att gag aca ate acg etc ccc 2703 
Asp Phe Ser Leu Asp Pro Thr Phe Thr He Glu Thr He Thr Leu Pro 
225 230 235 

caa gat get gtc tec cgc act caa cgt egg ggc agg act ggc agg ggg 2751 
Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly 
240 245 250 

aag cca ggc ate tac aga ttt gtg gca ccg ggg gag cgc ccc tec ggc 2799 
Lys Pro Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly 
255 260 265 270 

atg ttc gac teg tec gtc etc tgt gag tgc tat gac gca ggc tgt get 2847 
Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala 
275 280 285 

tgg tat gag etc acg ccc gec gag act aca gtt agg eta cga gcg tac 2895 
Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr 
290 295 300 

atg aac acc ccg ggg ctt ccc gtg tgc cag gac cat ctt gaa ttt tgg 2943 
Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp 
305 310 315 

gag ggc gtc ttt aca ggc etc act cat ata gat gee cac ttt eta tec 2991 
Glu Gly Val Phe Thr Gly Leu Thr His He Asp Ala His Phe Leu Ser 
320 325 330 

cag aca aag cag agt ggg gag aac ctt cct tac ctg gta gcg tac caa 3039 
Gin Thr Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin 
335 340 345 350 

gee acc gtg tgc get agg get caa gee cct ccc cca teg tgg gac cag 3087 
Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin 
355 360 365 

atg tgg aag tgt ttg att cgc etc aag ccc acc etc cat ggg cca aca 3135 
Met Trp Lys Cys Leu He Arg Leu Lys Pro Thr Leu His Gly Pro Thr 
370 375 380 
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ccc ctg eta tac aga ctg ggc get gtt cag aat gaa ate ace ctg acg 3183 
Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu lie Thr Leu Thr 
385 390 395 

cac cca gtc ace aaa tac ate atg aca tgc atg teg gee gac ctg gag 3231 
His Pro Val Thr Lys Tyr lie Met Thr Cys Met Ser Ala Asp Leu Glu 
400 405 410 

gtc gtc acg age ace tgg gtg etc gtt ggc ggc gtc ctg get get ttg 3279 
Val Val Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu 
415 420 425 430 

gee gcg tat tgc ctg tea aca ggc tgc gtg gtc ata gtg ggc agg gtc 3327 
Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val He Val Gly Arg Val 
435 440 445 

gtc ttg tec ggg aag ccg gca ate ata cct gac agg gaa gtc etc tac 3375 
Val Leu Ser Gly Lys Pro Ala He He Pro Asp Arg Glu Val Leu Tyr 
450 455 460 

cga gag ttc gat gag atg gaa gag tgc tct cag cac tta ccg tac ate 3423 
Arg Glu Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr He 
465 470 475 

gag caa ggg atg atg etc gee gag cag ttc aag cag aag gec etc ggc 3471 
Glu Gin Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly 
480 485 490 

etc ctg cag ace gcg tec cgt cag gca gag gtt ate gee cct get gtc 3519 
Leu Leu Gin Thr Ala Ser Arg Gin Ala Glu Val lie Ala Pro Ala Val 
495 500 505 510 

cag ace aac tgg caa aaa etc gag ace ttc tgg gcg aag cat atg tgg 3567 
Gin Thr Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp 
515 520 525 

aac ttc ate agt ggg ata caa tac ttg gcg ggc ttg tea acg ctg cct 3615 
Asn Phe He Ser Gly lie Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro 
530 535 540 

ggt aac ccc gee att get tea ttg atg get ttt aca get get gtc ace 3663 
Gly Asn Pro Ala lie Ala Ser Leu Met Ala. Phe Thr Ala Ala Val Thr 
545 550 555 

age cca eta acc act age caa ace etc etc ttc aac ata ttg ggg ggg 3711 
Ser Pro Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn lie Leu Gly Gly 
560 565 570 

tgg gtg get gee cag etc gee gee ccc ggt gee get act gee ttt gtg 3759 
Trp Val Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val 
575 580 585 590 

ggc get ggc tta get ggc gee gee ate ggc agt gtt gga ctg ggg aag 3807 
Gly Ala Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly Leu Gly Lys 
595 600 605 

gtc etc ata gac ate ctt gca ggg tat ggc gcg ggc gtg gcg gga get 3855 
Val Leu lie Asp lie Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala 
610 615 620 
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ctt gtg gca ttc aag ate atg age ggt gag gtc ccc tec acg gag gac 3903 
Leu Val Ala Phe Lys lie Met Ser Gly Glu Val Pro Ser Thr Glu Asp 
625 630 635 

ctg gtc aat eta ctg ccc gec ate etc teg ccc gga gee etc gta gtc 3951 
Leu Val Asn Leu Leu Pro Ala lie Leu Ser Pro Gly Ala Leu Val Val 
640 645 650 

ggc gtg gtc tgt gca gca ata ctg cgc egg cac gtt ggc ccg ggc gag 3999 
Gly Val Val Cys Ala Ala He Leu Arg Arg His Val Gly Pro Gly Glu 
655 660 665 670 

999 9ca gtg cag tgg atg aac egg ctg ata gee ttc gee tec egg ggg 4047 
Gly Ala Val Gin Trp Met Asn Arg Leu He Ala Phe Ala Ser Arg Gly 
675 680 685 

aac cat gtt tec ccc acg cac tac gtg ccg gag age gat gca get gee 4095 
Asn His Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala 
690 695 700 

cgc gtc act gee ata etc age age etc act gta ace cag etc ctg agg 4143 
Arg Val Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg 
705 710 715 

cga ctg cac cag tgg ata age teg gag tgt ace act cca tgc tec ggt 4191 
Arg Leu His Gin Trp He Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly 
720 725 730 

tec tgg eta agg gac ate tgg gac tgg ata tgc gag gtg ttg age gac 423 9 
Ser Trp Leu Arg Asp He Trp Asp Trp He Cys Glu Val Leu Ser Asp 
735 740 745 750 

ttt aag acc tgg eta aaa get aag etc atg cca cag ctg cct ggg ate 4287 
Phe Lys Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly He 
755 760 765 

ccc ttt gtg tec tgc cag cgc ggg tat aag ggg gtc tgg cga ggg gac 4335 
Pro Phe Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp 
770 775 780 

ggc ate atg cac act cgc tgc cac tgt gga get gag ate act gga cat 4383 
Gly He Met His Thr Arg Cys His Cys Gly Ala Glu He Thr GlylHis 
785 790 795 

gtc aaa aac ggg acg atg agg ate gtc ggt cct agg acc tgc agg aac 4431 
Val Lys Asn Gly Thr Met Arg He Val Gly Pro Arg Thr Cys Arg Asn 
800 805 810 

atg tgg agt ggg acc ttc ccc att aat gee tac acc acg ggc ccc tgt 4479 
Met Trp Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr Gly Pro Cys 
815 820 825 830 

acc ccc ctt cct gcg ccg aac tac acg ttc gcg eta tgg agg gtg tct 4527 
Thr Pro Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser 
835 840 845 

gca gag gaa tac gtg gag ata agg cag gtg ggg gac ttc cac tac gtg 4575 
Ala Glu Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe His Tyr Val 
850 855 860 
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acg ggt atg act act gac aat ctt aaa tgc ccg tgc cag gtc cca teg 4623 
Thr Gly Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser 
865 870 875 

ccc gaa ttt ttc aca gaa ttg gac ggg gtg cgc eta cat agg ttt gcg 4671 
Pro Glu Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala 
880 885 890 

ccc ccc tgc aag ccc ttg ctg egg gag gag gta tea ttc aga gta gga 4719 
Pro Pro Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly 
895 900 905 910 

etc cac gaa tac ccg gta ggg teg caa tta cct tgc gag ccc gaa ccg 4767 
Leu His Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro 
915 920 925 

gac gtg gee gtg ttg acg tec atg etc act gat ccc tec cat ata aca 4815 
Asp Val Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His lie Thr 
930 935 940 

gca gag gcg gee ggg cga agg ttg gcg agg gga tea ccc ccc tct gtg 4863 
Ala Glu Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val 
945 950 955 

gee age tec teg get age cag eta tec get cca tct etc aag gca act 4911 
Ala Ser Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr 
960 965 970 

tgc ace get aac cat gac tec cct gat get gag etc ata gag gee aac 4959 
Cys Thr Ala Asn His Asp Ser Pro Asp Ala Glu Leu lie Glu Ala Asn 
975 980 985 990 

etc eta tgg agg cag gag atg ggc ggc aac ate acc agg gtt gag tea 5007 
Leu Leu Trp Arg Gin Glu Met Gly Gly Asn He Thr Arg Val Glu Ser 
995 1000 1005 

gaa aac aaa gtg gtg att ctg gac tec ttc gat ccg ctt gtg gcg gag 5055 
Glu Asn Lys Val Val He Leu Asp Ser Phe Asp Pro Leu Val Ala Glu 
1010 1015 1020 

gag gac gag egg gag ate tec gta ccc gca gaa ate ctg egg aag tct 5103 
Glu Asp Glu Arg Glu He Ser Val Pro Ala Glu He Leu Arg Lys Ser 
1025 1030 1035 

egg aga ttc gee cag gee ctg ccc gtt tgg gcg egg ccg gac tat aac 5151 
Arg Arg Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn 
1040 1045 1050 

ccc ccg eta gtg gag acg tgg aaa aag ccc gac tac gaa cca cct gtg 5199 
Pro Pro Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val 
1055 1060 1065 1070 

gtc cat ggc tgc ccg ctt cca cct cca aag tec cct cct gtg cct ccg 5247 
Val His Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro 
1075 1080 1085 

cct egg aag aag egg acg gtg gtc etc act gaa tea acc eta tct act 5295 
Pro Arg Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr 
1090 1095 1100 
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gcc ttg gcc gag etc gec acc aga age ttt ggc age tec tea act tec 5343 
Ala Leu Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser 
1105 1110 1115 

ggc att acg ggc gac aat acg aca aca tec tct gag ccc gcc cct tct 53 91 
Gly lie Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser 
1120 1125 1130 

ggc tgc ccc ccc gac tec gac get gag tec tat tec tec atg ccc ccc 5439 
Gly Cys Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro 
1135 1140 1145 1150 

ctg gag ggg gag cct ggg gat ccg gat ctt age gac ggg tea tgg tea 54 87 
Leu Glu Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser 
1155 1160 1165 

acg gtc agt agt gag gcc aac gcg gag gat gtc gtg tgc tgc tea atg 5535 
Thr Val Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met 
1170 1175 1180 

tct tac tct tgg aca ggc gca etc gtc acc ccg tgc gcc gcg gaa gaa 5583 
Ser Tyr Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu 
1185 1190 1195 

cag aaa ctg ccc ate aat gca eta age aac teg ttg eta cgt cac cac 5631 
Gin Lys Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu Arg His His 
1200 1205 1210 

aat ttg gtg tat tec acc acc tea cgc agt get tgc caa agg cag aag 5679 
Asn Leu Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys 
1215 1220 1225 1230 

aaa gtc aca ttt gac aga ctg caa gtt ctg gac age cat tac cag gac 5727 
Lys Val Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp 
1235 1240 1245 

gta etc aag gag gtt aaa gca gcg gcg tea aaa gtg aag get aac ttg 5775 
Val Leu Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu 
1250 1255 1260 

eta tec gta gag gaa get tgc age ctg acg ccc cca cac tea gcc aaa 5823 
Leu Ser Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys 
1265 1270 1275 

tec aag ttt ggt tat ggg gca aaa gac gtc cgt tgc cat gcc aga aag 5871 
Ser Lys Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys 
1280 1285 1290 

gcc gta acc cac ate aac tec gtg tgg aaa gac ctt ctg gaa gac aat 5919 
Ala Val Thr His lie Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn 
1295 1300 1305 1310 

gta aca cca ata gac act acc ate atg get aag aac gag gtt ttc tgc 5967 
Val Thr Pro lie Asp Thr Thr He Met Ala Lys Asn Glu Val Phe Cys 
1315 1320 1325 

gtt cag cct gag aag ggg ggt cgt aag cca get cgt etc ate gtg ttc 6015 
Val Gin Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu He Val Phe 
1330 1335 1340 
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ccc gat ctg ggc gtg cgc gtg tgc gaa aag atg get ttg tac gac gtg 6063 
Pro Asp Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val 
1345 1350 1355 

gtt aca aag etc ccc ttg gec gtg atg gga age tec tac gga ttc caa 6111 
Val Thr Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin 
1360 1365 1370 

tac tea cca gga cag egg gtt gaa ttc etc gtg caa gcg tgg aag tec 6159 
Tyr Ser Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser 
1375 1380 1385 1390 

aag aaa ace cca atg ggg ttc teg tat gat acc cgc tgc ttt gac tec 6207 
Lys Lys Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser 
1395 1400 1405 

aca gtc act gag age gac ate cgt acg gag gag gca ate tac caa tgt 6255 
Thr Val Thr Glu Ser Asp lie Arg Thr Glu Glu Ala lie Tyr Gin Cys 
1410 1415 1420 

tgt gac etc gac ccc caa gee cgc gtg gee ate aag tec etc acc gag 6303 
Cys Asp Leu Asp Pro Gin Ala Arg Val Ala lie Lys Ser Leu Thr Glu 
1425 1430 1435 

agg ctt tat gtt ggg ggc cct ctt acc aat tea agg ggg gag aac tgc 6351 
Arg Leu Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys 
1440 1445 1450 

ggc tat cgc agg tgc cgc gcg age ggc gta ctg aca act age tgt ggt 6399 
Gly Tyr Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly 
1455 1460 1465 1470 

aac acc etc act tgc tac ate aag gee egg gca gee tgt cga gee gca 6447 
Asn Thr Leu Thr Cys Tyr He Lys Ala Arg Ala Ala Cys Arg Ala Ala 
1475 1480 1485 

ggg etc cag gac tgc acc atg etc gtg tgt ggc gac gac tta gtc gtt 6495 
Gly Leu Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val 
1490 1495 1500 

ate tgt gaa age gcg ggg gtc cag gag gac gcg gcg age ctg aga gee 6543 
He Cys Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala 
1505 1510 1515 

ttc acg gag get atg acc agg tac tec gec ccc cct ggg gac ccc cca 6591 
Phe Thr Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro 
1520 1525 1530 

caa cca gaa tac gac ttg gag etc ata aca tea tgc tec tec aac gtg 6639 
Gin Pro Glu Tyr Asp Leu Glu Leu He Thr Ser Cys Ser Ser Asn Val 
1535 1540 1545 1550 

tea gtc gee cac gac ggc get gga aag agg gtc tac tac etc acc cgt 6687 
Ser Val Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg 
1555 1560 * 1565 

gac cct aca acc ccc etc gcg aga get gcg tgg gag aca gca aga cac 6735 
Asp Pro Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His 
1570 1575 1580 
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act cca gtc aat tec tgg eta ggc aac ata ate atg ttt gec ccc aca 6783 
Thr Pro Val Asn Ser Tip Leu Gly Asn lie lie Met Phe Ala Pro Thr 
1585 1590 1595 

ctg tgg gcg agg atg ata ctg atg acc cat ttc ttt age gtc ctt ata 6831 
Leu Trp Ala Arg Met He Leu Met Thr His Phe Phe Ser Val Leu He 
1600 1605 1610 

gee agg gac cag ctt gaa cag gec etc gat tgc gag ate tac ggg gee 6879 
Ala Arg Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu He Tyr Gly Ala 
1615 1620 1625 1630 

tgc tac tec ata gaa cca ctg gat eta cct cca ate att caa aga etc 6927 
Cys Tyr Ser He Glu Pro Leu Asp Leu Pro Pro He He Gin Arg Leu 
1635 1640 1645 

cat ggc etc age gca ttt tea etc cac agt tac tct cca ggt gaa ate 6975 
His Gly Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu He 
1650 1655 1660 

aat agg gtg gec gca tgc etc aga aaa ctt ggg gta ccg ccc ttg cga 7023 
Asn Arg Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg 
1665 1670 1675 

get tgg aga cac egg gee egg age gtc cgc get agg ctt ctg gee aga 7071 
Ala Trp Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg 
1680 1685 1690 

gga ggc agg get gec ata tgt ggc aag tac etc ttc aac tgg gca gta 7119 
Gly Gly Arg Ala Ala He Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val 
1695 1700 1705 1710 

aga aca aag etc aaa etc act cca ata gcg gec get ggc cag ctg gac 7167 
Arg Thr Lys Leu Lys Leu Thr Pro He Ala Ala Ala Gly Gin Leu Asp 
1715 1720 1725 

ttg tec ggc tgg ttc acg get ggc tac age ggg gga gac att tat cac 7215 
Leu Ser Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp He Tyr His 
1730 1735 1740 

age gtg tct cat gee egg ccc cgc tgg ate tgg ttt tgc eta etc ctg 7263 
Ser Val Ser His., Ala Arg Pro Arg Trp He Trp Phe Cys Leu Leu Leu 
1745 1750 1755 

ctt get gca ggg gta ggc ate tac etc etc ccc aac cga tgaaggttgg 7312 
Leu Ala Ala Gly Val Gly He Tyr Leu Leu Pro Asn Arg 
1760 1765 1770 

ggtaaacact ccggcctaaa aaaaaaaaaa aatctagaaa ggcgcgccaa gatatcaagg 7372 

atccactacg cgttagagct cgctgatcag cctcgactgt gecttctagt tgccagccat 7432 

ctgttgtttg cccctccccc gtgccttcct tgaccctgga aggtgecact cccactgtcc 7492 

tttcctaata aaatgaggaa attgeatege attgtctgag taggtgtcat tctattctgg 7552 

ggggtggggt ggggcaggac agcaaggggg aggattggga agacaatagc aggcatgetg 7612 

gggagctctt ccgcttcctc gctcactgac tcgctgcgct eggtegtteg getgeggega 7672 
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gcggtatcag 


ctcactcaaa 


ggcggtaata 


cggttatcca 


cagaatcagg 


ggataacgca 


7732 


ggaaagaaca 


tgtgagcaaa 


aggccagcaa 


aaggccagga 


accgtaaaaa 


ggccgcgttg 


7792 


ctggcgtttt 


tccataggct 


ccgcccccct 


gacgagcatc 


acaaaaatcg 


acgctcaagt 


7852 


cagaggtggc 


gaaacccgac 


aggactataa 


agataccagg 


cgtttccccc 


tggaagctcc 


7912 


ctcgtgcgct 


ctcctgttcc 


gaccctgccg 


cttaccggat 


acctgtccgc 


ctttctccct 


7972 


tcgggaagcg 


tggcgctttc 


tcaatgctca 


cgctgtaggt 


atctcagttc 


ggtgtaggtc 


8032 


gttcgctcca 


agctgggctg 


tgtgcacgaa 


ccccccgttc 


agcccgaccg 


ctgcgcctta 


8092 


tccggtaact 


atcgtcttga 


gtccaacccg 


gtaagacacg 


acttatcgcc 


actggcagca 


8152 


gccactggta 


acaggattag 


cagagcgagg 


tatgtaggcg 


gtgctacaga 


gttcttgaag 


8212 


tggtggccta 


actacggcta 


cactagaagg 


acagtatttg 


gtatctgcgc 


tctgctgaag 


8272 


ccagttacct 


tcggaaaaag 


agttggtagc 


tcttgatccg 


gcaaacaaac 


caccgctggt 


8332 


agcggtggtt 


tttttgtttg 


caagcagcag 


attacgcgca 


gaaaaaaagg 


atctcaagaa 


8392 


gatcctttga 


tcttttctac 


ggggtctgac 


gctcagtgga 


acgaaaactc 


acgttaaggg 


8452 


attttggtca 


tgagattatc 


aaaaaggatc 


ttcacctaga 


tccttttaaa 


ttaaaaatga 


8512 


agttttaaat 


caatctaaag 


tatatatgag 


taaacttggt 


ctgacagtta 


ccaatgctta 


8572 


atcagtgagg 


cacctatctc 


agcgatctgt 


ctatttcgtt 


catccatagt 


tgcctgactc 


8632 


cccgtcgtgt 


agataactac 


gatacgggag 


ggcttaccat 


ctggccccag 


tgctgcaatg 


8692 


ataccgcgag 


acccacgctc 


accggctcca 


gatttatcag 


caataaacca 


gccagccgga 


8752 


agggccgagc 


gcagaagtgg 


tcctgcaact 


ttatccgcct 


ccatccagtc 


tattaattgt 


8812 


tgccgggaag 


ctagagtaag 


tagttcgcca 


gttaatagtt 


tgcgcaacgt 


tgttgccatt 


8872 


gctacaggca 


tcgtggtgtc 


acgctcgtcg 


tttggtatgg 
atgttgtgca 


cttcattcag 


ctccggttcc 


8932 


caacgatcaa 


ggcgagttac 


atgatccccc 


aaaaagcggt 


tagctccttc 


8992 


ggtcctccga 


tcgttgtcag 


aagtaagttg gccgcagtgt 


tatcactcat 


ggttatggca 


9052 


gcactgcata 


attctcttac 


tgtcatgcca 


tccgtaagat 


gcttttctgt 


gactggtgag 


9112 


tactcaacca 


agtcattctg 


agaatagtgt 


atgcggcgac 


cgagttgctc 


ttgcccggcg 


9172 


tcaatacggg 


ataataccgc 


gccacatagc 


agaactttaa 


aagtgctcat 


cattggaaaa 


9232 


cgttcttcgg 


ggcgaaaact 


ctcaaggatc 


ttaccgctgt 


tgagatccag 


ttcgatgtaa 


9292 


cccactcgtg 


cacccaactg 


atcttcagca 


tcttttactt 


tcaccagcgt 


ttctgggtga 


9352 


gcaaaaacag 


gaaggcaaaa 


tgccgcaaaa 


aagggaataa 


gggcgacacg 


gaaatgttga 


9412 


atactcatac 


tcttcctttt 


tcaatattat 


tgaagcattt 


atcagggtta 


ttgtctcatg 


9472 
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agcggataca tatttgaatg tatttagaaa aataaacaaa taggggttcc gcgcacattt 9532 
ccccgaaaag tgccacctga cgtctaagaa accattatta tcatgacatt aacctataaa 9592 



<210> 4 
<211> 1771 
<212> PRT 

<213> Artificial Sequence 
<400> 4 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu Asn Pro 
15 10 15 

Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys Ala His 
20 25 30 

Gly lie Asp Pro Asn He Arg Thr Gly Val Arg Thr He Thr Thr Gly 
35 40 45 

Ser Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp Gly Gly 
50 55 60 

Cys Ser Gly Gly Ala Tyr Asp He He He Cys Asp Glu Cys His Ser 
65 70 75 80 

Thr Asp Ala Thr Ser He Leu Gly He Gly Thr Val Leu Asp Gin Ala 
85 90 95 

Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr Pro Pro 
100 105 110 

Gly Ser Val Thr Val Pro His Pro Asn He Glu Glu Val Ala Leu Ser 
115 120 125 

Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala He Pro Leu Glu Val 
130 135 140 

He Lys Gly Gly Arg His Leu He Phe Cys His Ser Lys Lys Lys Cys 
145 i v-150 155 160 

Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly He Asn Ala Val Ala 
165 170 175 

Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro Thr Ser Gly Asp Val 
180 185 190 

Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe 
195 200 205 

Asp Ser Val He Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe 
210 215 220 

Ser Leu Asp Pro Thr Phe Thr He Glu Thr He Thr Leu Pro Gin Asp 
225 230 235 240 

Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly Lys Pro 



aataggcgta 



tcacgaggcc ctttcgtc 



9620 
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245 



250 



255 



Gly lie Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly Met Phe 
260 265 270 

Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala Trp Tyr 
275 280 285 

Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr Met Asn 
290 295 300 

Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp Glu Gly 
305 310 315 320 

Val Phe Thr Gly Leu Thr His He Asp Ala His Phe Leu Ser Gin Thr 
325 330 335 

Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin Ala Thr 
340 345 350 

Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin Met Trp 
355 360 365 

Lys Cys Leu He Arg Leu Lys Pro Thr Leu His Gly Pro Thr Pro Leu 
370 375 380 

Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu He Thr Leu Thr His Pro 
385 390 395 400 

Val Thr Lys Tyr He Met Thr Cys Met Ser Ala Asp Leu Glu Val Val 
405 410 415 

Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu Ala Ala 
420 425 430 

Tyr Cys Leu Ser Thr Gly Cys Val Val He Val Gly Arg Val Val Leu 
435 440 445 

Ser Gly Lys Pro Ala He He Pro Asp Arg Glu Val Leu Tyr Arg Glu 
450 455 460 

Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr He Glu Gin 
465 470 475 480 

Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly Leu Leu 
485 490 495 

Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala Pro Ala Val Gin Thr 
500 505 510 

Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp Asn Phe 
515 520 525 

He Ser Gly lie Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro Gly Asn 
53 0 535 540 



Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr Ser Pro 
545 550 555 560 
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Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn lie Leu Gly Gly Trp Val 
565 570 575 

Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val Gly Ala 
580 585 590 

Gly Leu Ala Gly Ala Ala lie Gly Ser Val Gly Leu Gly Lys Val Leu 
595 600 605 

He Asp He Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala Leu Val 
610 615 620 

Ala Phe Lys He Met Ser Gly Glu Val Pro Ser Thr Glu Asp Leu Val 
625 630 635 640 

Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala Leu Val Val Gly Val 
645 650 655 

Val Cys Ala Ala He Leu Arg Arg His Val Gly Pro Gly Glu Gly Ala 
660 665 670 

Val Gin Trp Met Asn Arg Leu He Ala Phe Ala Ser Arg Gly Asn His 
675 680 685 

Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala Arg Val 
690 695 700 

Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg Arg Leu 
705 710 715 ~ 720 

His Gin Trp He Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly Ser Trp 
725 730 735 

Leu Arg Asp He Trp Asp Trp He Cys Glu Val Leu Ser Asp Phe Lys 
740 745 750 

Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly He Pro Phe 
755 760 765 

Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp Gly He 
770 775 780 

Met His Thr Arg Cys His Cys Gly Ala Glu He Thr Gly His Val Lys 
785 790 795 800 

Asn Gly Thr Met Arg He Val Gly Pro Arg Thr Cys Arg Asn Met Trp 
805 810 815 

Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr Gly Pro Cys Thr Pro 
820 825 830 

Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser Ala Glu 
835 840 845 

Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe His Tyr Val Thr Gly 
850 855 860 



Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser Pro Glu 
865 870 875 880 
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Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala Pro Pro 
885 890 895 

Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly Leu His 
900 905 910 

Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro Asp Val 
915 920 925 

Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His lie Thr Ala Glu 
930 935 940 

Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val Ala Ser 
945 950 955 960 

Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr Cys Thr 
965 970 975 

Ala Asn His Asp Ser Pro Asp Ala Glu Leu lie Glu Ala Asn Leu Leu 
980 985 990 

Trp Arg Gin Glu Met Gly Gly Asn lie Thr Arg Val Glu Ser Glu Asn 
995 1000 1005 

Lys Val Val lie Leu Asp Ser Phe Asp Pro Leu Val Ala Glu Glu Asp 
1010 1015 1020 

Glu Arg Glu He Ser Val Pro Ala Glu He Leu Arg Lys Ser Arg Arg 
025 1030 1035 1040 

Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn Pro Pro 
1045 1050 1055 

Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val Val His 
1060 1065 1070 

Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro Pro Arg 
1075 1080 1085 

Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr Ala Leu 

1090 1095 1100 

' k * 

Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser Gly He 

105 1110 1115 1120 

Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser Gly Cys 
1125 1130 1135 

Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro Leu Glu 
1140 1145 1150 

Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser Thr Val 
1155 1160 1165 

Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met Ser Tyr 
1170 1175 1180 

Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu Gin Lys 
185 1190 1195 1200 
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Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu Arg His His Asn Leu 
1205 1210 1215 

Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys Lys Val 
1220 1225 1230 

Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp Val Leu 
1235 1240 1245 

Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu Leu Ser 
1250 1255 1260 

Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys Ser Lys 
265 1270 1275 1280 

Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys Ala Val 
1285 1290 1295 

Thr His lie Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn Val Thr 
1300 1305 1310 

Pro lie Asp Thr Thr lie Met Ala Lys Asn Glu Val Phe Cys Val Gin 
1315 1320 1325 

Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu lie Val Phe Pro Asp 
1330 1335 1340 

Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val Val Thr 
345 1350 1355 1360 

Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin Tyr Ser 
1365 1370 1375 

Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser Lys Lys 
1380 1385 1390 

Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val 
1395 1400 1405 

Thr Glu Ser Asp lie Arg Thr Glu Glu Ala lie Tyr Gin Cys Cys Asp 
1410 1415 1420 

Leu Asp Pro Gin Ala Arg Val Ala He Lys Ser Leu Thr Glu Arg Leu 
425 1430 1435 1440 

Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr 
1445 1450 1455 

Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
1460 1465 1470 

Leu Thr Cys Tyr He Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
1475 1480 1485 

Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val He Cys 
1490 1495 1500 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe Thr 
505 1510 1515 1520 
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Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro Gin Pro 
1525 1530 1535 

Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser Ser Asn Val Ser Val 
1540 1545 1550 



Ala His Asp Gly Ala Gly Lys Arg 
1555 1560 

Thr Thr Pro Leu Ala Arg Ala Ala 
1570 1575 

Val Asn Ser Trp Leu Gly Asn lie 
585 1590 

Ala Arg Met lie Leu Met Thr His 
1605 



Val Tyr Tyr Leu Thr Arg Asp Pro 
1565 

Trp Glu Thr Ala Arg His Thr Pro 
1580 

lie Met Phe Ala Pro Thr Leu Trp 
1595 1600 

Phe Phe Ser Val Leu lie Ala Arg 
1610 1615 



Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu lie Tyr Gly Ala Cys Tyr 
1620 1625 1630 

Ser lie Glu Pro Leu Asp Leu Pro Pro lie lie Gin Arg Leu His Gly 
1635 1640 1645 

Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu lie Asn Arg 
1650 1655 1660 

Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg Ala Trp 
665 1670 1675 1680 

Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg Gly Gly 
1685 1690 1695 

Arg Ala Ala He Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val Arg Thr 
1700 1705 1710 

Lys Leu Lys Leu Thr Pro He Ala Ala Ala Gly Gin Leu Asp Leu Ser 
1715 1720 1725 

Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp He Tyr His Ser Val 
1730 1735 1740 

i ■ 

Ser His Ala Arg Pro Arg Trp He Trp Phe Cys Leu Leu Leu Leu Ala 
745 1750 1755 1760 

Ala Gly Val Gly He Tyr Leu Leu Pro Asn Arg 
1765 1770 



<210> 5 
<211> 4282 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: pCMVII 
<400> 5 

tcgcgcgttt cggtgatgac ggtgaaaacc tctgacacat gcagctcccg gagacggtca 60 
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cagcttgtct 


gtaagcggat 


gccgggagca 


gacaagcccg 


tcagggcgcg 


tcagcgggtg 


120 


ttggcgggtg 


tcggggctgg 


cttaactatg 


cggcatcaga 


gcagattgta 


ctgagagtgc 


180 


accatatgaa 


gctttttgca 


aaagcctagg 


cctccaaaaa 


agcctcctca 


ctacttctgg 


240 


aatagctcag 


aggccgaggc 


ggcctcggcc 


tctgcataaa 


taaaaaaaat 


tagtcagcca 


300 


tggggcggag 


aatgggcgga 


actgggcggg 


gagggaatta 


ttggctattg 


gccattgcat 


360 


acgttgtatc 


tatatcataa 


tatgtacatt 


tatattggct 


catgtccaat 


atgaccgcca 


420 


tgttgacatt 


gattattgac 


tagttattaa 


tagtaatcaa 


ttacggggtc 


attagttcat 


480 


agcccatata 


tggagttccg 


cgttacataa 


cttacggtaa 


atggcccgcc 


tggctgaccg 


540 


cccaacgacc 


cccgcccatt 


gacgtcaata 


atgacgtatg 


ttcccatagt 


aacgccaata 


600 


gggactttcc 


attgacgtca 


atgggtggag 


tatttacggt 


aaactgccca 


cttggcagta 


660 


catcaagtgt 


atcatatgcc 


aagtccgccc 


cctattgacg 


tcaatgacgg 


taaatggccc 


720 


gcctggcatt 


atgcccagta 


catgacctta 


cgggactttc 


ctacttggca 


gtacatctac 


780 


gtattagtca 


tcgctattac 


catggtgatg 


cggttttggc 


agtacaccaa 


tgggcgtgga 


840 


tagcggtttg 


actcacgggg 


atttccaagt 


ctccacccca 


ttgacgtcaa 


tgggagtttg 


900 


ttttggcacc 


aaaatcaacg 


ggactttcca 


aaatgtcgta 


ataaccccgc 


cccgttgacg 


960 


caaatgggcg 


gtaggcgtgt 


acggtgggag 


gtctatataa 


gcagagctcg 


tttagtgaac 


1020 


cgtcagatcg 


cctggagacg 


ccatccacgc 


tgttttgacc 


tccatagaag 


acaccgggac 


1080 


cgatccagcc 


tccgcggccg 


ggaacggtgc 


attggaacgc 


ggattccccg 


tgccaagagt 


1140 


gacgtaagta 


ccgcctatag 


actctatagg 


cacacccctt 


tggctcttat 


gcatgctata 


1200 


ctgtttttgg 


cttggggcct 


atacaccccc 


gcttccttat 


gctataggtg 


atggtatagc 


1260 


ttagcctata 


ggtgtgggtt 


attgaccatt 


attgaccact 


cccctattgg 


tgacgatact 


1320 


ttccattact 


aatccataac 


atggctcttt 


gccacaacta 


tctctattgg 


ctatatgcca 


1380 


atactctgtc 


cttcagagac 


tgacacggac 


tctgtatttt 


tacaggatgg 


ggtcccattt 


1440 


attatttaca 


aattcacata 


tacaacaacg 


ccgtcccccg 


tgcccgcagt 


ttttattaaa 


1500 


catagcgtgg 


gatctccacg 


cgaatctcgg 


gtacgtgttc 


cggacatggg 


ctcttctccg 


1560 


gtagcggcgg 


agcttccaca 


tccgagccct 


ggtcccatgc 


ctccagcggc 


tcatggtcgc 


1620 


tcggcagctc 


cttgctccta 


acagtggagg 


ccagacttag 


gcacagcaca 


atgcccacca 


1680 


ccaccagtgt 


gccgcacaag 


gccgtggcgg 


tagggtatgt 


gtctgaaaat 


gagctcggag 


1740 


attgggctcg 


caccgctgac 


gcagatggaa 


gacttaaggc 


agcggcagaa 


gaagatgcag 


1800 


gcagctgagt 


tgttgtattc 


tgataagagt 


cagaggtaac 


tcccgttgcg 


gtgctgttaa 


1860 
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cggtggaggg 


cagtgtagtc 


tgagcagtac 


tcgttgctgc 


cgcgcgcgcc 


accagacata 


1920 


atagctgaca 


gactaacaga 


ctgttccttt 


ccatgggtct 


tttctgcagt 


caccgtcgtc 


1980 


gacctaagaa 


ttcagactcg 


agcaagtcta 


gaaaggcgcg 


ccaagatatc 


aaggatccac 


2040 


tacgcgttag 


agctcgctga 


tcagcctcga 


ctgtgccttc 


tagttgccag 


ccatctgttg 


2100 


tttgcccctc 


ccccgtgcct 


tccttgaccc 


tggaaggtgc 


cactcccact 


gtcctttcct 


2160 


aataaaatga 


ggaaattgca 


tcgcattgtc 


tgagtaggtg 


tcattctatt 


ctggggggtg 


2220 


gggtggggca 


ggacagcaag 


ggggaggatt 


999 aa 9 a caa 


tagcaggcat 


gctggggagc 


2280 


tcttccgctt 


cctcgctcac 


tgactcgctg 


cgctcggtcg 


ttcggctgcg 


gcgagcggta 


2340 


tcagctcact 


caaaggcggt 


aatacggtta 


tccacagaat 


caggggataa 


cgcaggaaag 


2400 


aacatgtgag 


caaaaggcca 


gcaaaaggcc 


aggaaccgta 


aaaaggccgc 


gttgctggcg 


2460 


tttttccata 


ggctccgccc 


ccctgacgag 


catcacaaaa 


atcgacgctc 


aagtcagagg 


2520 


tggcgaaacc 


cgacaggact 


ataaagatac 


caggcgtttc 


cccctggaag 


ctccctcgtg 


2580 


cgctctcctg 


ttccgaccct 


gccgcttacc 


ggatacctgt 


ccgcctttct 


cccttcggga 


2640 


agcgtggcgc 


tttctcaatg 


ctcacgctgt 


aggtatctca 


gttcggtgta 


ggtcgttcgc 


2700 


tccaagctgg 


gctgtgtgca 


cgaacccccc 


gttcagcccg 


accgctgcgc 


cttatccggt 


2760 


aactatcgtc 


ttgagtccaa 


cccggtaaga 


cacgacttat 


cgccactggc 


agcagccact 


2820 


ggtaacagga 


ttagcagagc 


gaggtatgta 


ggcggtgcta 


cagagttctt 


gaagtggtgg 


2880 


cctaactacg 


gctacactag 


aaggacagta 


tttggtatct 


gcgctctgct 


gaagccagtt 


2940 


accttcggaa 


aaagagttgg 


tagctcttga 


tccggcaaac 


aaaccaccgc 


tggtagcggt 


3000 


ggtttttttg 


tttgcaagca 


gcagattacg 


cgcagaaaaa 


aaggatctca 


agaagatcct 


3060 


ttgatctttt 


ctacggggtc 


tgacgctcag 


tggaacgaaa 


actcacgtta 


agggattttg 


3120 


gtcatgagat 


tatcaaaaag 


gatcttcacc 


tagatccttt 


taaattaaaa 


atgaagtttt 


3180 


aaatcaatct 


aaagtatata 


tgagtaaact 


tggtctgaca 


gttaccaatg 


cttaatcagt 


3240 


gaggcaccta 


tctcagcgat 


ctgtctattt 


cgttcatcca 


tagttgcctg 


actccccgtc 


3300 


gtgtagataa 


ctacgatacg 


ggagggctta 


ccatctggcc 


ccagtgctgc 


aatgataccg 


3360 


cgagacccac 


gctcaccggc 


tccagattta 


tcagcaataa 


accagccagc 


cggaagggcc 


3420 


gagcgcagaa 


gtggtcctgc 


aactttatcc 


gcctccatcc 


agtctattaa 


ttgttgccgg 


3480 


gaagctagag 


taagtagttc 


gccagttaat 


agtttgcgca 


acgttgttgc 


cattgctaca 


3540 


ggcatcgtgg 


tgtcacgctc 


gtcgtttggt 


atggcttcat 


tcagctccgg 


ttcccaacga 


3600 


tcaaggcgag 


ttacatgatc 


ccccatgttg 


tgcaaaaaag 


cggttagctc 


cttcggtcct 


3660 
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ccgatcgttg 


tcagaagtaa gttggccgca gtgttatcac tcatggttat ggcagcactg 


3 720 


cataattctc 


ttactgtcat 


gccatccgta 


agatgctttt ctgtgactgg tgagtactca 


3780 


accaagtcat 


tctgagaata 


gtgtatgcgg 


cgaccgagtt gctcttgccc ggcgtcaata 


3840 


cgggataata 


ccgcgccaca 


tagcagaact 


ttaaaagtgc tcatcattgg aaaacgttct 


3900 


tcggggcgaa 


aactctcaag gatcttaccg 


ctgttgagat ccagttcgat gtaacccact 


3960 


cgtgcaccca 


actgatcttc 


agcatctttt 


actttcacca gcgtttctgg gtgagcaaaa 


4020 


acaggaaggc 


aaaatgccgc 


aaaaaaggga 


ataagggcga cacggaaatg ttgaatactc 


4080 


atactcttcc 


tttttcaata 


ttattgaagc 


atttatcagg gttattgtct catgagcgga 


4140 


tacatatttg 


aatgtattta 


gaaaaataaa 


caaatagggg ttccgcgcac atttccccga 


4200 


aaagtgccac 


ctgacgtcta 


agaaaccatt 


attatcatga cattaaccta taaaaatagg 


4260 


cgtatcacga 


ggccctttcg 


tc 




4282 



<210> 6 
<211> 6299 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: pNS34a 

<220> 
<221> CDS 

<222> (1990) . . (4047) 
<400> 6 



cgcgcgtttc 


ggtgatgacg 


gtgaaaacct 


ctgacacatg 


cagctcccgg agacggtcac 


60 


agcttgtctg 


taagcggatg 


ccgggagcag 


acaagcccgt 


cagggcgcgt 


cagcgggtgt 


120 


tggcgggtgt 


cggggctggc 


ttaactatgc 


ggcatcagag 


cagattgtac 


tgagagtgca 


180 


ccatatgaag 


ctttttgcaa 


aagcctaggc 


ctccaaaaaa 


gcctcctcac 


tacttctgga 


240 


atagctcaga 


ggccgaggcg 


gcctcggcct 


ctgcataaat 


aaaaaaaatt 


agtcagccat 


300 


ggggcggaga 


atgggcggaa 


ctgggcgggg 


agggaattat 


tggctattgg ccattgcata 


360 


cgttgtatct 


atatcataat 


atgtacattt 


atattggctc 


atgtccaata 


tgaccgccat 


420 


gttgacattg 


attattgact 


agttattaat 


agtaatcaat 


tacggggtca 


ttagttcata 


480 


gcccatatat 


ggagttccgc 


gttacataac 


ttacggtaaa 


tggcccgcct 


ggctgaccgc 


540 


ccaacgaccc 


ccgcccattg 


acgtcaataa 


tgacgtatgt 


tcccatagta 


acgccaatag 


600 


ggactttcca 


ttgacgtcaa 


tgggtggagt 


atttacggta 


aactgcccac 


ttggcagtac 


660 


atcaagtgta 


tcatatgcca 


agtccgcccc 


ctattgacgt 


caatgacggt 


aaatggcccg 


720 
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cctggcatta 


tgcccagtac 


atgaccttac 


gggactttcc 


tacttggcag 


tacatctacg 


780 


tattagtcat 


cgctattacc 


atggtgatgc 


ggttttggca 


gtacaccaat 


gggcgtggat 


840 


agcggtttga 


ctcacgggga 


tttccaagtc 


tccaccccat 


tgacgtcaat 


gggagtttgt 


900 


tttggcacca 


aaatcaacgg 


gactttccaa 


aatgtcgtaa 


taaccccgcc 


ccgttgacgc 


960 


aaatgggcgg 


taggcgtgta 


cggtgggagg 


tctatataag 


cagagctcgt 


ttagtgaacc 


1020 


gtcagatcgc 


ctggagacgc 


catccacgct 


gttttgacct 


ccatagaaga 


caccgggacc 


1080 


gatccagcct 


ccgcggccgg 


gaacggtgca 


ttggaacgcg 


gattccccgt 


gccaagagtg 


1140 


acgtaagtac 


cgcctataga 


ctctataggc 


acaccccttt 


ggctcttatg 


catgctatac 


1200 


tgtttttggc 


ttggggccta 


tacacccccg 


ctccttatgc 


tataggtgat 


ggtatagctt 


1260 


agcctatagg 


tgtgggttat 


tgaccattat 


tgaccactcc 


cctattggtg 


acgatacttt 


1320 


ccattactaa 


tccataacat 


ggctctttgc 


cacaactatc 


tctattggct 


atatgccaat 


1380 


actctgtcct 


tcagagactg 


acacggactc 


tgtattttta 


caggatgggg 


tccatttatt 


1440 


atttacaaat 


tcacatatac 


aacaacgccg 


tcccccgtgc 


ccgcagtttt 


tattaaacat 


1500 


agcgtgggat 


ctccgacatc 


tcgggtacgt 


gttccggaca 


tgggctcttc 


tccggtagcg 


1560 


gcggagcttc 


cacatccgag 


ccctggtccc 


atccgtccag 


cggctcatgg 


tcgctcggca 


1620 


gctccttgct 


cctaacagtg 


gaggccagac 


ttaggcacag 


cacaatgccc 


accaccacca 


1680 


gtgtgccgca 


caaggccgtg 


gcggtagggt 


atgtgtctga 


aaatgagctc 


ggagattggg 


1740 


ctcgcacctg 


gacgcagatg 


gaagacttaa 


ggcagcggca 


gaagaagatg 


caggcagctg 


1800 


agttgttgta 


ttctgataag 


agtcagaggt 


aactcccgtt 


gcggtgctgt 


taacggtgga 


1860 


gggcagtgta 


gtctgagcag 


tactcgttgc 


tgccgcgcgc 


gccaccagac 


ataatagctg 


1920 


acagactaac 


agactgttcc 


tttccatggg 


tcttttctgc 


agtcaccgtc 


gtcgacctaa 


1980 



gaattcacc atg gcg ccc ate acg gcg tac gec cag cag aca agg ggc etc 2 031 
Met Ala Pro He Thr Ala Tyr Ala Gin Gin Thr Arg Gly Leu 
15 10 

eta ggg tgc ata ate ace age eta act ggc egg gac aaa aac caa gtg 2079 
Leu Gly Cys He He Thr Ser Leu Thr Gly Arg Asp Lys Asn Gin Val 
15 20 25 30 

gag ggt gag gtc cag att gtg tea act get gee caa ace ttc ctg gca 2127 
Glu Gly Glu Val Gin He Val Ser Thr Ala Ala Gin Thr Phe Leu Ala 
35 40 45 

acg tgc ate aat ggg gtg tgc tgg act gtc tac cac ggg gee gga acg 2175 
Thr Cys lie Asn Gly Val Cys Trp Thr Val Tyr His Gly Ala Gly Thr 
50 55 60 

agg ace ate gcg tea ccc aag ggt cct gtc ate cag atg tat ace aat 2223 
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Arg Thr He Ala Ser Pro Lys Gly Pro Val He Gin Met Tyr Thr Asn 

65 70 75 

gta gac caa gac ctt gtg ggc tgg ccc get teg caa ggt ace cgc tea 2271 

Val Asp Gin Asp Leu Val Gly Trp Pro Ala Ser Gin Gly Thr Arg Ser 
80 85 90 

ttg aca ccc tgc act tgc ggc tec teg gac ctt tac ctg gtc acg agg 2319 

Leu Thr Pro Cys Thr Cys Gly Ser Ser Asp Leu Tyr Leu Val Thr Arg 
95 100 105 110 

cac gee gat gtc att ccc gtg cgc egg egg ggt gat age agg ggc age 2367 

His Ala Asp Val He Pro Val Arg Arg Arg Gly Asp Ser Arg Gly Ser 
115 120 125 

ctg ctg teg ccc egg ccc att tec tac ttg aaa ggc tec teg ggg ggt 2415 

Leu Leu Ser Pro Arg Pro He Ser Tyr Leu Lys Gly Ser Ser Gly Gly 

130 135 140 

ccg ctg ttg tgc ccc gcg ggg cac gee gtg ggc ata ttt agg gee gcg 2463 

Pro Leu Leu Cys Pro Ala Gly His Ala Val Gly He Phe Arg Ala Ala 

145 150 155 

gtg tgc ace cgt gga gtg get aag gcg gtg gac ttt ate cct gtg gag 2 511 

Val Cys Thr Arg Gly Val Ala Lys Ala Val Asp Phe He Pro Val Glu 
160 165 170 

aac eta gag aca ace atg agg tec ccg gtg ttc acg gat aac tec tct 2559 

Asn Leu Glu Thr Thr Met Arg Ser Pro Val Phe Thr Asp Asn Ser Ser 
175 180 185 190 

cca cca gta gtg ccc cag age ttc cag gtg get cac etc cat get ccc 2607 

Pro Pro Val Val Pro Gin Ser Phe Gin Val Ala His Leu His Ala Pro 
195 200 205 

aca ggc age ggc aaa age ace aag gtc ccg get gca tat gca get cag 2655 

Thr Gly Ser Gly Lys Ser Thr Lys Val Pro Ala Ala Tyr Ala Ala Gin 

210 215 220 

ggc tat aag gtg eta gta etc aac ccc tct gtt get gca aca ctg ggc 2703 

Gly Tyr Lys Val Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly 

225 230 235 

ttt ggt get tac atg tec aag get cat ggg ate gat cct aac ate agg 2751 

Phe Gly Ala Tyr Met Ser Lys Ala His Gly He Asp Pro Asn He Arg 
240 245 250 

ace ggg gtg aga aca att ace act ggc age ccc ate acg tac tec ace 2799 

Thr Gly Val Arg Thr He Thr Thr Gly Ser Pro He Thr Tyr Ser Thr 
255 260 265 270 

tac ggc aag ttc ctt gee gac ggc ggg tgc teg ggg ggc get tat gac 2847 

Tyr Gly Lys Phe Leu Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp 
275 280 285 

ata ata att tgt gac gag tgc cac tec acg gat gec aca tec ate ttg 2895 

He He He Cys Asp Glu Cys His Ser Thr Asp Ala Thr Ser He Leu 

290 295 300 
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ggc att ggc act gtc ctt gac caa gca gag act gcg ggg gcg aga ctg 2 943 
Gly lie Gly Thr Val Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg Leu 
305 310 315 

gtt gtg etc gec acc gec acc cct ccg ggc tec gtc act gtg ccc cat 2991 
Val Val Leu Ala Thr Ala Thr Pro Pro Gly Ser Val Thr Val Pro His 
320 325 330 

ccc aac ate gag gag gtt get ctg tec acc acc gga gag ate cct ttt 3039 
Pro Asn lie Glu Glu Val Ala Leu Ser Thr Thr Gly Glu lie Pro Phe 
335 340 345 350 

tac ggc aag get ate ccc etc gaa gta ate aag ggg ggg aga cat etc 3087 
Tyr Gly Lys Ala lie Pro Leu Glu Val lie Lys Gly Gly Arg His Leu 
355 360 365 

ate ttc tgt cat tea aag aag aag tgc gac gaa etc gee gca aag ctg 3135 
lie Phe Cys His Ser Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys Leu 
370 375 380 

gtc gca ttg ggc ate aat gec gtg gec tac tac cgc ggt ctt gac gtg 3183 
Val Ala Leu Gly He Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val 
385 390 395 

tec gtc ate ccg acc age ggc gat gtt gtc gtc gtg gca acc gat gee 3231 
Ser Val He Pro Thr Ser Gly Asp Val Val Val Val Ala Thr Asp Ala 
400 405 410 

etc atg acc ggc tat acc ggc gac ttc gac teg gtg ata gac tgc aat 3279 
Leu Met Thr Gly Tyr Thr Gly Asp Phe Asp Ser Val He Asp Cys Asn 
415 420 425 430 

acg tgt gtc acc cag aca gtc gat ttc age ctt gac cct acc ttc acc 3327 
Thr Cys Val Thr Gin Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr 
435 440 445 

att gag aca ate acg etc ccc caa gat get gtc tec cgc act caa cgt 3375 
He Glu Thr He Thr Leu Pro Gin Asp Ala Val Ser Arg Thr Gin Arg 
450 455 460 

c gg 99C agg act ggc agg ggg aag cca ggc ate tac aga ttt gtg gca 3423 
Arg Gly Arg Thr Gly Arg Gly Lys Pro Gly He Tyr Arg Phe Val Ala 
465 470 475 

ccg ggg gag cgc ccc tec ggc atg ttc gac teg tec gtc etc tgt gag 3471 
Pro Gly Glu Arg Pro Ser Gly Met Phe Asp Ser Ser Val Leu Cys Glu 
480 485 490 

tgc tat gac gca ggc tgt get tgg tat gag etc acg ccc gec gag act 3519 
Cys Tyr Asp Ala Gly Cys Ala Trp Tyr Glu Leu Thr Pro Ala Glu Thr 
495 500 505 510 

aca gtt agg eta cga gcg tac atg aac acc ccg ggg ctt ccc gtg tgc 3567 
Thr Val Arg Leu Arg Ala Tyr Met Asn Thr Pro Gly Leu Pro Val Cys 
515 520 525 

cag gac cat ctt gaa ttt tgg gag ggc gtc ttt aca ggc etc act cat 3615 
Gin Asp His Leu Glu Phe Trp Glu Gly Val Phe Thr Gly Leu Thr His 
530 535 540 
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ata gat gcc cac ttt eta tec cag aca aag cag agt ggg gag aac ctt 3 663 
lie Asp Ala His Phe Leu Ser Gin Thr Lys Gin Ser Gly Glu Asn Leu 
545 550 555 

cct tac ctg gta gcg tac caa gcc acc gtg tgc get agg get caa gcc 3711 
Pro Tyr Leu Val Ala Tyr Gin Ala Thr Val Cys Ala Arg Ala Gin Ala 
560 565 570 

cct ccc cca teg tgg gac cag atg tgg aag tgt ttg att cgc etc aag 3 759 
Pro Pro Pro Ser Trp Asp Gin Met Trp Lys Cys Leu lie Arg Leu Lys 
575 580 585 590 

ccc acc etc cat ggg cca aca ccc ctg eta tac aga ctg ggc get gtt 3 807 
Pro Thr Leu His Gly Pro Thr Pro Leu Leu Tyr Arg Leu Gly Ala Val 
595 600 605 

cag aat gaa ate acc ctg acg cac cca gtc acc aaa tac ate atg aca 3 855 
Gin Asn Glu lie Thr Leu Thr His Pro Val Thr Lys Tyr lie Met Thr 
610 615 620 

tgc atg teg gcc gac ctg gag gtc gtc acg age acc tgg gtg etc gtt 3903 
Cys Met Ser Ala Asp Leu Glu Val Val Thr Ser Thr Trp Val Leu Val 
625 630 635 

99C ggc gtc ctg get get ttg gcc gcg tat tgc ctg tea aca ggc tgc 3 951 
Gly Gly Val Leu Ala Ala Leu Ala Ala Tyr Cys Leu Ser Thr Gly Cys 
640 645 650 

gtg gtc ata gtg ggc agg gtc gtc ttg tec ggg aag ccg gca ate ata 3999 
Val Val He Val Gly Arg Val Val Leu Ser Gly Lys Pro Ala He He 
655 660 665 670 

cct gac agg gaa gtc etc tac cga gag ttc gat gag atg gaa gag tgc 4047 
Pro Asp Arg Glu Val Leu Tyr Arg Glu Phe Asp Glu Met Glu Glu Cys 





675 




680 




685 




taggatccac 


tacgegttag 


agetegctga 


tcagcctcga 


ctgtgccttc 


tagttgccag 


4107 


ccatctgttg 


tttgcccctc 


ccccgtgcct 


tccttgaccc 


tggaaggtgc 


cactcccact 


4167 


gtcctttcct 


aataaaatga 


ggaaattgea 


tcgcattgtc 


tgagtaggtg 


tcattctatt 


4227 


ctggggggtg 


gggtggggca 


ggacagcaag 


ggggaggatt 


gggaagacaa 


tagcaggcat 


4287 


gctggggagc 


tcttccgctt 


cctcgctcac 


tgactcgctg 


cgctcggtcg 


ttcggctgcg 


4347 


gcgagcggta 


tcagctcact 


caaaggeggt 


aatacggtta 


tccacagaat 


caggggataa 


4407 


cgcaggaaag 


aacatgtgag 


caaaaggeca 


geaaaaggee 


aggaacegta 


aaaaggcege 


4467 


gttgctggcg 


tttttccata 


ggctccgccc 


ccctgacgag 


catcacaaaa 


atcgacgctc 


4527 


aagtcagagg 


tggegaaace 


cgacaggact 


ataaagatac 


caggegttte 


cccctggaag 


4587 


ctccctcgtg 


cgctctcctg 


ttccgaccct 


gccgcttacc 


ggatacctgt 


ccgcctttct 


4647 


cccttcggga 


agcgtggcgc 


tttctcaatg 


ctcacgctgt 


aggtatctca 


gttcggtgta 


4707 


ggtcgttcgc 


tccaagctgg 


gctgtgtgca 


cgaacccccc 


gttcagcccg 


accgctgcgc 


4767 
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cttatccggt 


aactatcgtc 


ttgagtccaa 


cccggtaaga cacgacttat 


cgccactggc 


4827 


agcagccact 


ggtaacagga 


ttagcagagc 


gaggtatgta ggcggtgcta 


cagagttctt 


4887 


gaagtggtgg 


cctaactacg 


gctacactag 


aaggacagta 


tttggtatct 


gcgctctgct 


4947 


gaagccagtt 


accttcggaa 


aaagagttgg 


tagctcttga 


tccggcaaac 


aaaccaccgc 


5007 


tggtagcggt 


ggtttttttg 


tttgcaagca 


gcagattacg cgcagaaaaa 


aaggatctca 


5067 


agaagatcct 


ttgatctttt 


ctacggggtc 


tgacgctcag tggaacgaaa 


actcacgtta 


5127 


agggattttg 


gtcatgagat 


tatcaaaaag 




tagatccttt 


taaattaaaa 


5187 


atgaagtttt 


aaatcaatct 


aaagtatata 


tgagtaaact 


tggtctgaca gttaccaatg 


5247 


cttaatcagt 


gaggcaccta 


tctcagcgat 


ctgtctattt 


cgttcatcca 


tagttgcctg 


5307 


actccccgtc 


gtgtagataa 


ctacgatacg 


ggagggctta 


ccatctggcc 


ccagtgctgc 


5367 


aatgataccg 


cgagacccac 


gctcaccggc 


tccagattta 


tcagcaataa 


accagccagc 


5427 


cggaagggcc 


gagcgcagaa 


gtggtcctgc 


aactttatcc 


gcctccatcc 


agtctattaa 


5487 


ttgttgccgg 


gaagctagag 


taagtagttc 


gccagttaat 


agtttgcgca acgttgttgc 


5547 


cattgctaca 


ggcatcgtgg 


tgtcacgctc 


gtcgtttggt 


atggcttcat 


tcagctccgg 


5607 


ttcccaacga 


tcaaggcgag 


ttacatgatc 


ccccatgttg 


tgcaaaaaag 


cggttagctc 


5667 


cttcggtcct 


ccgatcgttg 


tcagaagtaa 


gttggccgca 


gtgttatcac 


tcatggttat 


5727 


ggcagcactg 


cataattctc 


ttactgtcat 


gccatccgta agatgctttt 


ctgtgactgg 


5787 


tgagtactca 


accaagtcat 


tctgagaata 


gtgtatgcgg 


cgaccgagtt 


gctcttgccc 


5847 


ggcgtcaata 


cgggataata 


ccgcgccaca 


tagcagaact 


ttaaaagtgc 


tcatcattgg 


5907 


aaaacgttct 


tcggggcgaa 


aactctcaag 


gatcttaccg 


ctgttgagat 


ccagttcgat 


5967 


gtaacccact 


cgtgcaccca 


actgatcttc 


agcatctttt 


actttcacca gcgtttctgg 


6027 


gtgagcaaaa 


acaggaaggc 


aaaatgccgc 


aaaaaaggga 


ataagggcga 


cacggaaatg 


6087 


ttgaatactc 


atactcttcc 


tttttcaata 


ttattgaagc 


atttatcagg 


gttattgtct 


6147 


catgagcgga 


tacatatttg 


aatgtattta 


gaaaaataaa caaatagggg 


ttccgcgcac 


6207 


atttccccga 


aaagtgccac 


ctgacgtcta 


agaaaccatt 


attatcatga 


cattaaccta 


6267 


taaaaatagg 


cgtatcacga 


ggccctttcg 


tc 






6299 



<210> 7 
<211> 686 
<212> PRT 

<213> Artificial Sequence 
<220> 



40 



WO 01/38360 PCT/US00/32326 



<223> Description of Artificial Sequence: pNS34a 
<400> 7 

Met Ala Pro He Thr Ala Tyr Ala Gin Gin Thr Arg Gly Leu Leu Gly 
1 5 10 " 15 

Cys He He Thr Ser Leu Thr Gly Arg Asp Lys Asn Gin Val Glu Gly 
20 25 30 

Glu Val Gin He Val Ser Thr Ala Ala Gin Thr Phe Leu Ala Thr Cys 
35 40 45 

He Asn Gly Val Cys Trp Thr Val Tyr His Gly Ala Gly Thr Arg Thr 
50 55 60 

He Ala Ser Pro Lys Gly Pro Val He Gin Met Tyr Thr Asn Val Asp 
65 70 75 80 

Gin Asp Leu Val Gly Trp Pro Ala Ser Gin Gly Thr Arg Ser Leu Thr 
85 90 95 

Pro Cys Thr Cys Gly Ser Ser Asp Leu Tyr Leu Val Thr Arg His Ala 
100 105 110 

Asp Val He Pro Val Arg Arg Arg Gly Asp Ser Arg Gly Ser Leu Leu 
115 120 125 

Ser Pro Arg Pro He Ser Tyr Leu Lys Gly Ser Ser Gly Gly Pro Leu 
130 135 140 

Leu Cys Pro Ala Gly His Ala Val Gly He Phe Arg Ala Ala Val Cys 
145 150 155 160 

Thr Arg Gly Val Ala Lys Ala Val Asp Phe He Pro Val Glu Asn Leu 
165 170 175 

Glu Thr Thr Met Arg Ser Pro Val Phe Thr Asp Asn Ser Ser Pro Pro 
180 185 190 

Val Val Pro Gin Ser Phe Gin Val Ala His Leu His Ala Pro Thr Gly 
195 200 205 

Ser Gly Lys Ser Thr Lys Val Pro Ala Ala Tyr Ala Ala Gin Gly Tyr 
210 215 220 

Lys Val Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly 
225 230 235 240 

Ala Tyr Met Ser Lys Ala His Gly He Asp Pro Asn He Arg Thr Gly 
245 250 255 

Val Arg Thr He Thr Thr Gly Ser Pro He Thr Tyr Ser Thr Tyr Gly 
260 265 270 

Lys Phe Leu Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp He He 
275 280 285 

He Cys Asp Glu Cys His Ser Thr Asp Ala Thr Ser He Leu Gly He 
290 295 300 
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Gly Thr Val Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val 
305 310 315 320 

Leu Ala Thr Ala Thr Pro Pro Gly Ser Val Thr Val Pro His Pro Asn 
325 330 335 

He Glu Glu Val Ala Leu Ser Thr Thr Gly Glu He Pro Phe Tyr Gly 
340 345 350 

Lys Ala He Pro Leu Glu Val He Lys Gly Gly Arg His Leu He Phe 
355 360 365 

Cys His Ser Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala 
370 375 380 

Leu Gly He Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val 
385 390 395 400 

He Pro Thr Ser Gly Asp Val Val Val Val Ala Thr Asp Ala Leu Met 
405 410 415 

Thr Gly Tyr Thr Gly Asp Phe Asp Ser Val He Asp Cys Asn Thr Cys 
420 425 430 

Val Thr Gin Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr He Glu 
435 440 445 

Thr He Thr Leu Pro Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly 
450 455 460 

Arg Thr Gly Arg Gly Lys Pro Gly He Tyr Arg Phe Val Ala Pro Gly 
465 470 475 480 

Glu Arg Pro Ser Gly Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr 
485 490 495 

Asp Ala Gly Cys Ala Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val 
500 505 510 

Arg Leu Arg Ala Tyr Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp 
515 520 525 

His Leu Glu Phe Trp Glu Gly Val Phe Thr Gly Leu Thr His He Asp 
530 535 540 

Ala His Phe Leu Ser Gin Thr Lys Gin Ser Gly Glu Asn Leu Pro Tyr 
545 550 555 560 

Leu Val Ala Tyr Gin Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro 
565 570 ~ 575 

Pro Ser Trp Asp Gin Met Trp Lys Cys Leu He Arg Leu Lys Pro Thr 
580 585 590 

Leu His Gly Pro Thr Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn 
595 600 605 



Glu He Thr Leu Thr His Pro Val Thr Lys Tyr He Met Thr Cys Met 
610 615 620 
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Ser Ala Asp Leu Glu Val Val Thr Ser Thr Trp Val Leu Val Gly Gly 
625 630 635 640 

Val Leu Ala Ala Leu Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val 
645 650 655 

He Val Gly Arg Val Val Leu Ser Gly Lys Pro Ala He He Pro Asp 
660 665 670 

Arg Glu Val Leu Tyr Arg Glu Phe Asp Glu Met Glu Glu Cys 
675 680 685 

<210> 8 
<211> 19912 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: pd.deltaNS3NS5 

<220> 

<221> CDS 

<222> (12745) . . (18057) 

<400> 8 



atcgatccta 


ccccttgcgc 


taaagaagta 


tatgtgccta 


ctaacgcttg 


tctttgtctc 


60 


tgtcactaaa 


cactggatta 


ttactcccag 


atacttattt 


tggactaatt 


taaatgattt 


120 


cggatcaacg 


ttcttaatat 


cgctgaatct 


tccacaattg 


atgaaagtag 


ctaggaagag 


180 


gaattggtat 


aaagtttttg 


tttttgtaaa 


tctcgaagta 


tactcaaacg 


aatttagtat 


240 


tttctcagtg 


atctcccaga 


tgctttcacc 


ctcacttaga 


agtgctttaa 


gcattttttt 


300 


actgtggcta 


tttcccttat 


ctgcttcttc 


cgatgattcg 


aactgtaatt 


gcaaactact 


360 


tacaatatca 


gtgatatcag 


attgatgttt 


ttgtccatag 


taaggaataa 


ttgtaaattc 


420 


ccaagcagga 


atcaatttct 


ttaatgaggc 


ttccagaatt 


gttgcttttt 


gcgtcttgta 


480 


tttaaactgg 


agtgatttat 


tgacaatatc 


gaaactcagc 


gaattgctta 


tgatagtatt 


540 


atagctcatg 


aatgtggctc 


tcttgattgc 


tgttccgtta 


tgtgtaatca 


tccaacataa 


600 


ataggttagt 


tcagcagcac 


ataatgctat 


tttctcacct 


gaaggtcttt 


caaacctttc 


660 


cacaaactga 


cgaacaagca 


ccttaggtgg 


tgttttacat 


aatatatcaa 


attgtggcat 


720 


gcttagcgcc 


gatcttgtgt 


gcaattgata 


tctagtttca 


actactctat 


ttatcttgta 


780 


tcttgcagta 


ttcaaacacg 


ctaactcgaa 


aaactaactt 


taattgtcct 


gtttgtctcg 


840 


cgttctttcg 


aaaaatgcac 


cggccgcgca 


ttatttgtac 


tgcgaaaata 


attggtactg 


900 


cggtatcttc 


atttcatatt 


ttaaaaatgc 


acctttgctg 


cttttcctta 


atttttagac 


960 


ggcccgcagg 


ttcgttttgc 


ggtactatct 


tgtgataaaa 


agttgttttg 


acatgtgatc 


1020 
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tgcacagatt 


ttataatgta 


ataagcaaga 


atacattatc 


aaacgaacaa 


tactggtaaa 


1080 


agaaaaccaa 


aatggacgac 


attgaaacag 


ccaagaatct 


gacggtaaaa 


gcacgtacag 


1140 


cttatagcgt 


ctgggatgta 


tgtcggctgt 


ttattgaaat 


gattgctcct 


gatgtagata 


1200 


ttgatataga 


gagtaaacgt 


aagtctgatg 


agctactctt 


tccaggatat 


gtcataaggc 


1260 


ccatggaatc 


tctcacaacc 


ggtaggccgt 


atggtcttga 


ttctagcgca 


gaagattcca 


1320 


gcgtatcttc 


tgactccagt 


gctgaggtaa 


ttttgcctgc 


tgcgaagatg 


gttaaggaaa 


1380 


ggtttgattc 


gattggaaat 


ggtatgctct 


cttcacaaga 


agcaagtcag 


gctgccatag 


1440 


atttgatgct 


acagaataac 


aagctgttag 


acaatagaaa 


gcaactatac 


aaatctattg 


1500 


ctataataat 


aggaagattg 


cccgagaaag 


acaagaagag 


agctaccgaa 


atgctcatga 


1560 


gaaaaatgga 


ttgtacacag 


ttattagtcc 


caccagctcc 


aacggaagaa 


gatgttatga 


1620 


agctcgtaag 


cgtcgttacc 


caattgctta 


ctttagttcc 


accagatcgt 


caagctgctt 


1680 


taataggtga 


tttattcatc 


ccggaatctc 


taaaggatat 


attcaatagt 


ttcaatgaac 


1740 


tggcggcaga 


gaatcgttta 


cagcaaaaaa 


agagtgagtt 


ggaaggaagg 


actgaagtga 


1800 


accatgctaa 


tacaaatgaa 


gaagttccct 


ccaggcgaac 


aagaagtaga 


gacacaaatg 


1860 


caagaggagc 


atataaatta 


caaaacacca 


tcactgaggg 


ccctaaagcg 


gttcccacga 


1920 


aaaaaaggag 


agtagcaacg 


agggtaaggg 


gcagaaaatc 


acgtaatact 


tctagggtat 


1980 


gatccaatat 


caaaggaaat 


gatagcattg 


aaggatgaga 


ctaatccaat 


tgaggagtgg 


2040 


cagcatatag 


aacagctaaa 


gggtagtgct 


gaaggaagca 


tacgataccc 


cgcatggaat 


2100 


gggataatat 


cacaggaggt 


actagactac 


ctttcatcct 


acataaatag 


acgcatataa 


2160 


gtacgcattt 


aagcataaac 


acgcactatg 


ccgttcttct 


catgtatata 


tatatacagg 


2220 


caacacgcag 


atataggtgc 


gacgtgaaca 


gtgagctgta 


tgtgcgcagc 


tcgcgttgca 


2280 


ttttcggaag 


cgctcgtttt 


cggaaacgct 


ttgaagttcc 


tattccgaag 


ttcctattct 


2340 


ctagaaagta 


taggaacttc 


agagcgcttt 


tgaaaaccaa 


aagcgctctg 


aagacgcact 


2400 


ttcaaaaaac 


caaaaacgca 


ccggactgta 


acgagctact 


aaaatattgc 


gaataccgct 


2460 


tccacaaaca 


ttgctcaaaa 


gtatctcttt 


gctatatatc 


tctgtgctat 


atccctatat 


2520 


aacctaccca 


tccacctttc 


gctccttgaa 


cttgcatcta 


aactcgacct 


ctacatcaac 


2580 


aggcttccaa 


tgctcttcaa 


attttactgt 


caagtagacc 


catacggctg 


taatatgctg 


2640 


ctcttcataa 


tgtaagctta 


tctttatcga 


atcgtgtgaa 


aaactactac 


cgcgataaac 


2700 


ctttacggtt 


ccctgagatt 


gaattagttc 


ctttagtata 


tgatacaaga 


cacttttgaa 


2760 


ctttgtacga 


cgaattttga 


ggttcgccat 


cctctggcta 


tttccaatta 


tcctgtcggc 


2820 
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gaaacatgct 


gcttaaaact 


ccaagcggta 


ggagaccgat 


aaaggttaat 


aggacagccg 


2880 


tattatctcc 


gcctcagttt 


gatcttccgc 


ttcagactgc 


catttttcac 


ataatgaatc 


2940 


tatttcaccc 


cacaatcctt 


catccgcctc 


cgcatcttgt 


tccgttaaac 


tattgacttc 


3000 


atgttgtaca 


ttgtttagtt 


cacgagaagg 


gtcctcttca 


ggcggtagct 


cctgatctcc 


3060 


tatatgacct 


ttatcctgtt 


ctctttccac 


aaacttagaa 


atgtattcat 


gaattatgga 


3120 


gcacctaata 


acattcttca 


aggcggagaa 


gtttgggcca 


gatgcccaat 


atgcttgaca 


3180 


tgaaaacgtg 


agaatgaatt 


tagtattatt 


gtgatattct 


gaggcaattt 


tattataatc 


3240 


tcgaagataa 


gagaagaatg 


cagtgacctt 


tgtattgaca 


aatggagatt 


ccatgtatct 


3300 


aaaaaatacg 


cctttaggcc 


ttctgatacc 


ctttcccctg 


cggtttagcg 


tgccttttac 


3360 


attaatatct 


aaaccctctc 


cgatggtggc 


ctttaactga 


ctaataaatg 


caaccgatat 


3420 


aaactgtgat 


aattctgggt 


gatttatgat 


tcgatcgaca 


attgtattgt 


acactagtgc 


3480 


aggatcaggc 


caatccagtt 


ctttttcaat 


taccggtgtg 


tcgtctgtat 


tcagtacatg 


3540 


tccaacaaat 


gcaaatgcta 


acgttttgta 


tttcttataa 


ttgtcaggaa 


ctggaaaagt 


3600 


cccccttgtc 


gtctcgatta 


cacacctact 


ttcatcgtac 


accataggtt 


ggaagtgctg 


3660 


cataatacat 


tgcttaatac 


aagcaagcag 


tctctcgcca 


ttcatatttc 


agttattttc 


3720 


cattacagct 


gatgtcattg 


tatatcagcg 


ctgtaaaaat 


ctatctgtta 


cagaaggttt 


3780 


tcgcggtttt 


tataaacaaa 


actttcgtta 


cgaaatcgag 


caatcacccc 


agctgcgtat 


3840 


ttggaaattc 


gggaaaaagt 


agagcaacgc 


gagttgcatt 


ttttacacca 


taatgcatga 


3900 


ttaacttcga 


gaagggatta 


aggctaattt 


cactagtatg 


tttcaaaaac 


ctcaatctgt 


3960 


ccattgaatg 


ccttataaaa 


cagctataga 


ttgcatagaa 


gagttagcta 


ctcaatgctt 


4020 


tttgtcaaag 


cttactgatg 


atgatgtgtc 


tactttcagg 


cgggtctgta 


gtaaggagaa 


4080 


tgacattata 


aagctggcac 


ttagaattcc 


acggactata 


gactatacta 


gtatactccg 


4140 


tctactgtac 


gatacacttc 


cgctcaggtc 


cttgtccttt 


aacgaggcct 


taccactctt 


4200 


ttgttactct 


attgatccag 


ctcagcaaag 


gcagtgtgat 


ctaagattct 


atcttcgcga 


4260 


tgtagtaaaa 


ctagctagac 


cgagaaagag 


actagaaatg 


caaaaggcac 


ttctacaatg 


4320 


gctgccatca 


ttattatccg 


atgtgacgct 


gcattttttt 


tttttttttt 


tttttttttt 


4380 


tttttttttt 


tttttttttt 


ttttttggta 


caaatatcat 


aaaaaaagag 


aatcttttta 


4440 


agcaaggatt 


ttcttaactt 


cttcggcgac 


agcatcaccg 


acttcggtgg 


tactgttgga 


4500 


accacctaaa 


tcaccagttc 


tgatacctgc 


atccaaaacc 


tttttaactg 


catcttcaat 


4560 


ggctttacct 


tcttcaggca 


agttcaatga 


caatttcaac 


atcattgcag 


cagacaagat 


4620 
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agtggcgata 


gggttgacct 


tattctttgg 


caaatctgga 


gcggaaccat 


ggcatggttc 


4680 


gtacaaacca 


aatgcggtgt 


tcttgtctgg 


caaagaggcc 


aaggacgcag 


atggcaacaa 


4740 


acccaaggag 


cctgggataa 


cggaggcttc 


atcggagatg 


atatcaccaa 


acatgttgct 


4800 


ggtgattata 


ataccattta 


ggtgggttgg 


gttcttaact 


aggatcatgg 


cggcagaatc 


4860 


aatcaattga 


tgttgaactt 


tcaatgtagg 


gaattcgttc 


ttgatggttt 


cctccacagt 


4920 


ttttctccat 


aatcttgaag 


aggccaaaac 


attagcttta 


tccaaggacc 


aaataggcaa 


4980 


tggtggctca 


tgttgtaggg 


ccatgaaagc 


ggccattctt 


gtgattcttt 


gcacttctgg 


5040 


aacggtgtat 


tgttcactat 


cccaagcgac 


accatcacca 


tcgtcttcct 


ttctcttacc 


5100 


aaagtaaata 


cctcccacta 


attctctaac 


aacaacgaag 


tcagtacctt 


tagcaaattg 


5160 


tggcttgatt 


ggagataagt 


ctaaaagaga 


gtcggatgca 


aagttacatg 


gtcttaagtt 


5220 


ggcgtacaat 


tgaagttctt 


tacggatttt 


tagtaaacct 


tgttcaggtc 


taacactacc 


5280 


ggtaccccat 


ttaggaccac 


ccacagcacc 


taacaaaacg 


gcatcagcct 


tcttggaggc 


5340 


ttccagcgcc 


tcatctggaa 


gtggaacacc 


tgtagcatcg 


atagcagcac 


caccaattaa 


5400 


atgattttcg 


aaatcgaact 


tgacattgga 


acgaacatca 


gaaatagctt 


taagaacctt 


5460 


aatggcttcg 


gctgtgattt 


cttgaccaac 


gtggtcacct 


ggcaaaacga 


cgatcttctt 


5520 


aggggcagac 


attacaatgg 


tatatccttg 


aaatatatat 


aaaaaaaaaa 


aaaaaaaaaa 


5580 


aaaaaaaaaa 


atgcagcttc 


tcaatgatat 


tcgaatacgc 


tttgaggaga 


tacagcctaa 


5640 


tatccgacaa 


actgttttac 


agatttacga 


tcgtacttgt 


tacccatcat 


tgaattttga 


5700 


acatccgaac 


ctgggagttt 


tccctgaaac 


agatagtata 


tttgaacctg 


tataataata 


5760 


tatagtctag 


cgctttacgg 


aagacaatgt 


atgtatttcg 


gttcctggag 


aaactattgc 


5820 


atctattgca 


taggtaatct 


tgcacgtcgc 


atccccggtt 


cattttctgc 


gtttccatct 


5880 


tgcacttcaa 


tagcatatct 


ttgttaacga 


agcatctgtg 


cttcattttg 


tagaacaaaa 


5940 


atgcaacgcg 


agagcgctaa 


tttttcaaac 


aaagaatctg 


agctgcattt 


ttacagaaca 


6000 


gaaatgcaac 


gcgaaagcgc 


tattttacca 


acgaagaatc 


tgtgcttcat 


ttttgtaaaa 


6060 


caaaaatgca 


acgcgagagc 


gctaattttt 


caaacaaaga 


atctgagctg 


catttttaca 


6120 


gaacagaaat 


gcaacgcgag 


agcgctattt 


taccaacaaa 


gaatctatac 


ttcttttttg 


6180 


ttctacaaaa 


atgcatcccg 


agagcgctat 


ttttctaaca 


aagcatctta 


gattactttt 


6240 


tttctccttt 


gtgcgctcta 


taatgcagtc 


tcttgataac 


tttttgcact 


gtaggtccgt 


6300 


taaggttaga 


agaaggctac 


tttggtgtct 


attttctctt 


ccataaaaaa 


agcctgactc 


6360 


cacttcccgc 


gtttactgat 


tactagcgaa 


gctgcgggtg 


cattttttca 


agataaaggc 


6420 



46 



WO 01/38360 PCT/US00/32326 



atccccgatt 


atattctata 


ccgatgtgga 


ttgcgcatac 


tttgtgaaca 


gaaagtgata 


6480 


gcgttgatga 


ttcttcattg 


gtcagaaaat 


tatgaacggt 


ttcttctatt 


ttgtctctat 


6540 


atactacgta 


taggaaatgt 


ttacattttc 


gtattgtttt 


cgattcactc 


tatgaatagt 


6600 


tcttactaca 


atttttttgt 


ctaaagagta 


atactagaga 


taaacataaa 


aaatgtagag 


6660 


gtcgagttta 


gatgcaagtt 


caaggagcga 


aaggtggatg 


ggtaggttat 


atagggatat 


6720 


agcacagaga 


tatatagcaa 


agagatactt 


ttgagcaatg 


tttgtggaag 


cggtattcgc 


6780 


aatattttag 


tagctcgtta 


cagtccggtg 


cgtttttggt 


tttttgaaag 


tgcgtcttca 


6840 


gagcgctttt 


ggttttcaaa 


agcgctctga 


agttcctata 


ctttctagag 


aataggaact 


6900 


tcggaatagg 


aacttcaaag 


cgtttccgaa 


aacgagcgct 


tccgaaaatg 


caacgcgagc 


6960 


tgcgcacata 


cagctcactg 


ttcacgtcgc 


acctatatct 


gcgtgttgcc 


tgtatatata 


7020 


tatacatgag 


aagaacggca 


tagtgcgtgt 


ttatgcttaa 


atgcgtactt 


atatgcgtct 


7080 


atttatgtag 


gatgaaaggt 


agtctagtac 


ctcctgtgat 


attatcccat 


tccatgcggg 


7140 


gtatcgtatg 


cttccttcag 


cactaccctt 


tagctgttct 


atatgctgcc 


actcctcaat 


7200 


tggattagtc 


tcatccttca 


atgctatcat 


ttcctttgat 


attggatcat 


atgcatagta 


7260 


ccgagaaact 


agtgcgaagt 


agtgatcagg 


tattgctgtt 


atctgatgag 


tatacgttgt 


7320 


cctggccacg 


gcagaagcac 


gcttatcgct 


ccaatttccc 


acaacattag 


tcaactccgt 


7380 


taggcccttc 


attgaaagaa 


atgaggtcat 


caaatgtctt 


ccaatgtgag 


attttgggcc 


7440 


attttttata 


gcaaagattg 


aataaggcgc 


atttttcttc 


aaagctttat 


tgtacgatct 


7500 


gactaagtta 


tcttttaata 


attggtattc 


ctgtttattg 


cttgaagaat 


tgccggtcct 


7560 


atttactcgt 


tttaggactg 


gttcagaatt 


cctcaaaaat 


tcatccaaat 


atacaagtgg 


7620 


atcgatgata 


agctgtcaaa 


catgagaatt 


cttgaagacg 


aaagggcctc 


gtgatacgcc 


7680 


tatttttata 


ggttaatgtc 


atgataataa 


tggtttctta 


gacgtcaggt 


ggcacttttc 


7740 


ggggaaatgt 


gcgcggaacc 


cctatttgtt 


tatttttcta 


aatacattca 


aatatgtatc 


7800 


cgctcatgag 


acaataaccc 


tgataaatgc 


ttcaataata 


ttgaaaaagg 


aagagtatga 


7860 


gtattcaaca 


tttccgtgtc 


gcccttattc 


ccttttttgc 


ggcattttgc 


cttcctgttt 


7920 


ttgctcaccc 


agaaacgctg 


gtgaaagtaa 


aagatgctga 


agatcagttg 


ggtgcacgag 


7980 


tgggttacat 


cgaactggat 


ctcaacagcg 


gtaagatcct 


tgagagtttt 


cgccccgaag 


8040 


aacgttttcc 


aatgatgagc 


acttttaaag 


ttctgctatg 


tggcgcggta 


ttatcccgtg 


8100 


ttgacgccgg 


gcaagagcaa 


ctcggtcgcc 


gcatacacta 


ttctcagaat 


gacttggttg 


8160 


agtactcacc 


agtcacagaa 


aagcatctta 


cggatggcat 


gacagtaaga 


gaattatgca 


6220 
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gtgctgccat 


aaccatgagt 


gataacactg 


cggccaactt 


acttctgaca 


acgatcggag 


8280 


gaccgaagga 


gctaaccgct 


tttttgcaca 


acatggggga 


tcatgtaact 


cgccttgatc 


8340 


gttgggaacc 


ggagctgaat 


gaagccatac 


caaacgacga 


gcgtgacacc 


acgatgcctg 


8400 


cagcaatggc 


aacaacgttg 


cgcaaactat 


taactggcga 


actacttact 


ctagcttccc 


8460 


ggcaacaatt 


aatagactgg 


atggaggcgg 


ataaagttgc 


aggaccactt 


ctgcgctcgg 


8520 


cccttccggc 


tggctggttt 


attgctgata 


aatctggagc 


cggtgagcgt 


gggtctcgcg 


8580 


gtatcattgc 


agcactgggg 


ccagatggta 


agccctcccg 


tatcgtagtt 


atctacacga 


8640 


cggggagtca 


ggcaactatg 


gatgaacgaa 


atagacagat 


cgctgagata 


ggtgcctcac 


8700 


tgattaagca 


ttggtaactg 


tcagaccaag 


tttactcata 


tatactttag 


attgatttaa 


8760 


aacttcattt 


ttaatttaaa 


aggatctagg 


tgaagatcct 


ttttgataat 


ctcatgacca 


8820 


aaatccctta 


acgtgagttt 


tcgttccact 


gagcgtcaga 


ccccgtagaa 


aagatcaaag 


8880 


gatcttcttg 


agatcctttt 


tttctgcgcg 


taatctgctg 


cttgcaaaca 


aaaaaaccac 


8940 


cgctaccagc 


ggtggtttgt 


ttgccggatc 


aagagctacc 


aactcttttt 


ccgaaggtaa 


9000 


ctggcttcag 


cagagcgcag 


ataccaaata 


ctgtccttct 


agtgtagccg 


tagttaggcc 


9060 


accacttcaa 


gaactctgta 


gcaccgccta 


catacctcgc 


tctgctaatc 


ctgttaccag 


9120 


tggctgctgc 


cagtggcgat 


aagtcgtgtc 


ttaccgggtt 


ggactcaaga 


cgatagttac 


9180 


cggataaggc 


gcagcggtcg 


ggctgaacgg 


ggggttcgtg 


cacacagccc 


agcttggagc 


9240 


gaacgaccta 


caccgaactg 


agatacctac 


agcgtgagct 


atgagaaagc 


gccacgcttc 


9300 


ccgaagggag 


aaaggcggac 


aggtatccgg 


taagcggcag 


ggtcggaaca 


ggagagcgca 


9360 


cgagggagct 


tccaggggga 


aacgcctggt 


atctttatag 


tcctgtcggg 


tttcgccacc 


9420 


tctgacttga 


gcgtcgattt 


ttgtgatgct 


cgtcaggggg 


gcggagccta 


tggaaaaacg 


9480 


ccagcaacgc 


ggccttttta 


cggttcctgg 


ccttttgctg 


gccttttgct 


cacatgttct 


9540 


ttcctgcgtt 


atcccctgat 


tctgtggata 


accgtattac 


cgcctttgag 


tgagctgata 


9600 


ccgctcgccg 


cagccgaacg 


accgagcgca 


gcgagtcagt 


gagcgaggaa 


gcggaagagc 


9660 


gcctgatgcg 


gtattttctc 


cttacgcatc 


tgtgcggtat 


ttcacaccgc 


atatggtgca 


9720 


ctctcagtac 


aatctgctct 


gatgccgcat 


agttaagcca 


gtatacactc 


cgctatcgct 


9780 


acgtgactgg 


gtcatggctg 


cgccccgaca 


cccgccaaca 


cccgctgacg 


cgccctgacg 


9840 


ggcttgtctg 


ctcccggcat 


ccgcttacag 


acaagctgtg 


accgtctccg 


ggagctgcat 


9900 


gtgtcagagg 


ttttcaccgt 


catcaccgaa 


acgcgcgagg 


cagctgcggt 


aaagctcatc 


9960 


agcgtggtcg 


tgaagcgatt 


cacagatgtc 


tgcctgttca 


tccgcgtcca 


gctcgttgag 


10020 
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tttctccaga 


agcgttaatg 


tctggcttct 


gataaagcgg 


gccatgttaa 


gggcggtttt 


10080 


ttcctgtttg 


gtcactgatg 


cctccgtgta 


agggggattt 


ctgttcatgg 


gggtaatgat 


10140 


accgatgaaa 


cgagagagga 


tgctcacgat 


acgggttact 


gatgatgaac 


atgcccggtt 


10200 


actggaacgt 


tgtgagggta 


aacaactggc 


ggtatggatg 


cggcgggacc 


agagaaaaat 


10260 


cactcagggt 


caatgccagc 


gcttcgttaa 


tacagatgta 


ggtgttccac 


agggtagcca 


10320 


gcagcatcct 


gcgatgcaga 


tccggaacat 


aatggtgcag 


ggcgctgact 


tccgcgtttc 


10380 


cagactttac 


gaaacacgga 


aaccgaagac 


cattcatgtt 


gttgctcagg 


tcgcagacgt 


10440 


tttgcagcag 


cagtcgcttc 


acgttcgctc 


gcgtatcggt 


gattcattct 


gctaaccagt 


10500 


aaggcaaccc 


cgccagccta 


gccgggtcct 


caacgacagg 


agcacgatca 


tgcgcacccg 


10560 


tggccaggac 


ccaacgctgc 


ccgagatgcg 


ccgcgtgcgg 


ctgctggaga 


tggcggacgc 


10620 


gatggatatg 


ttctgccaag 


ggttggtttg 


cgcattcaca 


gttctccgca 


agaattgatt 


10680 


ggctccaatt 


cttggagtgg 


tgaatccgtt 


agcgaggtgc 


cgccggcttc 


cattcaggtc 


10740 


gaggtggccc 


ggctccatgc 


accgcgacgc 


aacgcgggga 


ggcagacaag 


gtatagggcg 


10800 


gcgcctacaa 


tccatgccaa 


cccgttccat 


gtgctcgccg 


aggcggcata 


aatcgccgtg 


10860 


acgatcagcg 


gtccaatgat 


cgaagttagg 


ctggtaagag 


ccgcgagcga 


tccttgaagc 


10920 


tgtccctgat 


ggtcgtcatc 


tacctgcctg 


gacagcatgg 


cctgcaacgc 


gggcatcccg 


10980 


atgccgccgg 


aagcgagaag 


aatcataatg 


gggaaggcca 


tccagcctcg 


cgtcgcgaac 


11040 


gccagcaaga 


cgtagcccag 


cgcgtcggcc 


gccatgccgg 


cgataatggc 


ctgcttctcg 


11100 


ccgaaacgtt 


tggtggcggg 


accagtgacg 


aaggcttgag 


cgagggcgtg 


caagattccg 


11160 


aataccgcaa 


gcgacaggcc 


gatcatcgtc 


gcgctccagc 


gaaagcggtc 


ctcgccgaaa 


11220 


atgacccaga 


gcgctgccgg 


cacctgtcct 


acgagttgca 


tgataaagaa 


gacagtcata 


11280 


agtgcggcga 


cgatagtcat 


gccccgcgcc 


caccggaagg 


agctgactgg 


gttgaaggct 


11340 


ctcaagggca 


tcggtcgagg 


atccttcaat 


atgcgcacat 


acgctgttat 


gttcaaggtc 


11400 


ccttcgttta 


agaacgaaag 


cggtcttcct 


tttgagggat 


gtttcaagtt 


gttcaaatct 


11460 


atcaaatttg 


caaatcccca 


gtctgtatct 


agagcgttga 


atcggtgatg 


cgatttgtta 


11520 


attaaattga 


tggtgtcacc 


attaccaggt 


ctagatatac 


caatggcaaa 


ctgagcacaa 


11580 


caataccagt 


ccggatcaac 


tggcaccatc 


tctcccgtag 


tctcatctaa 


tttttcttcc 


11640 


ggatgaggtt 


ccagatatac 


cgcaacacct 


ttattatggt 


ttccctgagg 


gaataataga 


11700 


atgtcccatt 


cgaaatcacc 


aattctaaac 


ctgggcgaat 


tgtatttcgg 


gtttgttaac 


11760 


tcgttccagt 


caggaatgtt 


ccacgtgaag 


ctatcttcca 


gcaaagtctc 


cacttcttca 


11820 
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tcaaattgtg 


gagaatactc 


ccaatgctct 


tatctatggg 


acttceggga 


aacacagtac 


11880 


cgatacttcc 


caattcgtct 


tcagagctca 


ttgtttgttt 


gaagagacta 


atcaaagaat 


11940 


cgttttctca 


aaaaaattaa 


tatcttaact 


gatagtttga 


tcaaaggggc 


aaaaegtagg 


12000 


ggcaaacaaa 


cggaaaaatc 


gtttctcaaa 


ttttctgatg 


ccaagaactc 


taaccagtct 


12060 


tatctaaaaa 


ttgccttatg 


atccgtctct 


ccggttacag 


cctgtgtaac 


tgattaatcc 


12120 


tgcctttcta 


atcaccattc 


taatgtttta 


attaagggat 


tttgtcttca 


ttaaeggett 


12180 


tcgctcataa 


aaatgttatg 


acgttttgcc 


cgcaggcggg 


aaaccatcca 


cttcacgaga 


12240 


ctgatctcct 


ctgccggaac 


accgggcatc 


tccaacttat 


aagttggaga 


aataagagaa 


12300 


tttcagattg 


agagaatgaa 


aaaaaaaaac 


ccttagttca 


taggtccatt 


ctcttagcgc 


12360 


aactacagag 


aacaggggca 


caaacaggca 


aaaaaeggge 


acaacctcaa 


tggagtgatg 


12420 


caacctgcct 


ggagtaaatg 


atgacacaag 


gcaattgacc 


cacgcatgta 


tctatctcat 


12480 


tttcttacac 


cttctattac 


cttctgctct 


ctctgatttg 


gaaaaagctg 


aaaaaaaagg 


12540 


ttgaaaccag 


ttccctgaaa 


ttattcccct 


acttgactaa 


taagtatata 


aagaeggtag 


12600 


gtattgattg 


taattctgta 


aatctatttc 


ttaaacttct 


taaattctac 


ttttatagtt 


12660 


agtctttttt 


ttagttttaa 


aacaccaaga 


acttagtttc 


gaataaacac 


acataaacaa 


12720 


acaagcttac 


aaaacaaatt 


cacc atg get gca tat gca get cag 


ggc tat 


12771 



Met Ala Ala Tyr Ala Ala Gin Gly Tyr 
1 5 

aag gtg eta gta etc aac ccc tct gtt get gca aca ctg ggc ttt ggt 12819 
Lys Val Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly 
10 15 20 25 

get tac atg tec aag get cat ggg ate gat cct aac ate agg ace ggg 12867 
Ala Tyr Met Ser Lys Ala His Gly He Asp Pro Asn He Arg Thr Gly 
30 35 40 

gtg aga aca att acc act ggc age ccc ate acg tac tec ace tac ggc 12915 
Val Arg Thr He Thr Thr Gly Ser Pro He Thr Tyr Ser Thr Tyr Gly 
45 50 55 

aag ttc ctt gee gac ggc ggg tgc teg ggg ggc get tat gac ata ata 12963 
Lys Phe Leu Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp He He 
60 65 70 

att tgt gac gag tgc cac tec acg gat gee aca tec ate ttg ggc att 13011 
He Cys Asp Glu Cys His Ser Thr Asp Ala Thr Ser He Leu Gly He 
75 80 85 

ggc act gtc ctt gac caa gca gag act gcg ggg gcg aga ctg gtt gtg 13059 
Gly Thr Val Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val 
90 95 100 105 

etc gee acc gee acc cct ccg ggc tec gtc act gtg ccc cat ccc aac 13107 
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Leu Ala Thr Ala Thr Pro Pro Gly Ser Val Thr Val Pro His Pro Asn 
110 115 120 

ate gag gag gtt get ctg tec acc ace gga gag ate cct ttt tac ggc 13155 
He Glu Glu Val Ala Leu Ser Thr Thr Gly Glu He Pro Phe Tyr Gly 
125 130 135 

aag get ate ccc etc gaa gta ate aag ggg ggg aga cat etc ate ttc 13203 
Lys Ala He Pro Leu Glu Val He Lys Gly Gly Arg His Leu lie Phe 
140 145 150 

tgt cat tea aag aag aag tgc gac gaa etc gee gca aag ctg gtc gca 13251 
Cys His Ser Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala 
155 160 165 

ttg ggc ate aat gee gtg gee tac tac cgc ggt ctt gac gtg tec gtc 13299 
Leu Gly He Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val 
170 175 180 185 

ate ccg acc age ggc gat gtt gtc gtc gtg gca acc gat gee etc atg 1334 7 
He Pro Thr Ser Gly Asp Val Val Val Val Ala Thr Asp Ala Leu Met 
190 195 200 

acc ggc tat acc ggc gac ttc gac teg gtg ata gac tgc aat acg tgt 133 95 
Thr Gly Tyr Thr Gly Asp Phe Asp Ser Val He Asp Cys Asn Thr Cys 
205 210 215 

gtc acc cag aca gtc gat ttc age ctt gac cct acc ttc acc att gag 13443 
Val Thr Gin Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr He Glu 
220 225 230 

aca ate acg etc ccc caa gat get gtc tec cgc act caa cgt egg ggc 134 91 
Thr He Thr Leu Pro Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly 
235 240 245 

agg act ggc agg ggg aag cca ggc ate tac aga ttt gtg gca ccg ggg 1353 9 
Arg Thr Gly Arg Gly Lys Pro Gly He Tyr Arg Phe Val Ala Pro Gly 
250 255 260 265 

gag cgc ccc tec ggc atg ttc gac teg tec gtc etc tgt gag tgc tat 13587 
Glu Arg Pro Ser Gly Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr 
270 275 280 

gac gca ggc tgt get tgg tat gag etc acg ccc gee gag act aca gtt 13635 
Asp Ala Gly Cys Ala Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val 
285 290 295 

agg eta cga gcg tac atg aac acc ccg ggg ctt ccc gtg tgc cag gac 13683 
Arg Leu Arg Ala Tyr Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp 
300 305 310 

cat ctt gaa ttt tgg gag ggc gtc ttt aca ggc etc act cat ata gat 13731 
His Leu Glu Phe Trp Glu Gly Val Phe Thr Gly Leu Thr His He Asp 
315 320 325 

gee cac ttt eta tec cag aca aag cag agt ggg gag aac ctt cct tac 13779 
Ala His Phe Leu Ser Gin Thr Lys Gin Ser Gly Glu Asn Leu Pro Tyr 
330 335 340 345 
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ctg gta gcg tac caa gcc acc gtg tgc get agg get caa gee cct ccc 13827 
Leu Val Ala Tyr Gin Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro 
350 355 360 

cca teg tgg gac cag atg tgg aag tgt ttg att cgc etc aag ccc acc 13 875 
Pro Ser Trp Asp Gin Met Trp Lys Cys Leu lie Arg Leu Lys Pro Thr 
365 370 375 

etc cat ggg cca aca ccc ctg eta tac aga ctg ggc get gtt cag aat 13 923 
Leu His Gly Pro Thr Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn 
380 385 390 

gaa ate acc ctg acg cac cca gtc acc aaa tac ate atg aca tgc atg 13 971 
Glu lie Thr Leu Thr His Pro Val Thr Lys Tyr lie Met Thr Cys Met 
395 400 405 

teg gcc gac ctg gag gtc gtc acg age acc tgg gtg etc gtt ggc ggc 14 019 
Ser Ala Asp Leu Glu Val Val Thr Ser Thr Trp Val Leu Val Gly Gly 
410 415 420 425 

gtc ctg get get ttg gcc gcg tat tgc ctg tea aca ggc tgc gtg gtc 14067 
Val Leu Ala Ala Leu Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val 
430 435 440 

ata gtg ggc agg gtc gtc ttg tec ggg aag ccg gca ate ata cct gac 14115 
lie Val Gly Arg Val Val Leu Ser Gly Lys Pro Ala He He Pro Asp 
445 450 455 

agg gaa gtc etc tac cga gag ttc gat gag atg gaa gag tgc tct cag 14163 
Arg Glu Val Leu Tyr Arg Glu Phe Asp Glu Met Glu Glu Cys Ser Gin 
460 465 470 

cac tta ccg tac ate gag caa ggg atg atg etc gcc gag cag ttc aag 14211 
His Leu Pro Tyr He Glu Gin Gly Met Met Leu Ala Glu Gin Phe Lys 
475 480 485 

cag aag gcc etc ggc etc ctg cag acc gcg tec cgt cag gca gag gtt 14259 
Gin Lys Ala Leu Gly Leu Leu Gin Thr Ala Ser Arg Gin Ala Glu Val 
490 495 500 505 

ate gcc cct get gtc cag acc aac tgg caa aaa etc gag acc ttc tgg 14307 
He Ala Pro Ala Val Gin Thr Asn Trp Gin Lys Leu Glu Thr Phe Trp 
510 515 520 

gcg aag cat atg tgg aac ttc ate agt ggg ata caa tac ttg gcg ggc 14355 
Ala Lys His Met Trp Asn Phe He Ser Gly He Gin Tyr Leu Ala Gly 
525 530 535 

ttg tea acg ctg cct ggt aac ccc gcc att get tea ttg atg get ttt 14403 
Leu Ser Thr Leu Pro Gly Asn Pro Ala He Ala Ser Leu Met Ala Phe 
540 545 550 

aca get get gtc acc age cca eta acc act age caa acc etc etc ttc 14451 
Thr Ala Ala Val Thr Ser Pro Leu Thr Thr Ser Gin Thr Leu Leu Phe 
555 560 565 

aac ata ttg ggg ggg tgg gtg get gcc cag etc gcc gcc ccc ggt gcc 14499 
Asn He Leu Gly Gly Trp Val Ala Ala Gin Leu Ala Ala Pro Gly Ala 
570 575 580 585 
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get act gec ttt gtg ggc get ggc tta get ggc gee gee ate ggc agt 14547 
Ala Thr Ala Phe Val Gly Ala Gly Leu Ala Gly Ala Ala lie Gly Ser 
590 595 600 

gtt gga ctg ggg aag gtc etc ata gac ate ctt gca ggg tat ggc gcg 14595 
Val Gly Leu Gly Lys Val Leu He Asp He Leu Ala Gly Tyr Gly Ala 
605 610 615 

ggc gtg gcg gga get ctt gtg gca ttc aag ate atg age ggt gag gtc 14643 
Gly Val Ala Gly Ala Leu Val Ala Phe Lys He Met Ser Gly Glu Val 
620 625 630 

ccc tec acg gag gac ctg gtc aat eta ctg ccc gee ate etc teg ccc 14691 
Pro Ser Thr Glu Asp Leu Val Asn Leu Leu Pro Ala He Leu Ser Pro 
635 640 645 

gga gee etc gta gtc ggc gtg gtc tgt gca gca ata ctg cgc egg cac 14 73 9 
Gly Ala Leu Val Val Gly Val Val Cys Ala Ala He Leu Arg Arg His 
650 655 660 665 

gtt ggc ccg ggc gag ggg gca gtg cag tgg atg aac egg ctg ata gee 14787 
Val Gly Pro Gly Glu Gly Ala Val Gin Trp Met Asn Arg Leu He Ala 
670 675 680 

ttc gee tec egg ggg aac cat gtt tec ccc acg cac tac gtg ccg gag 14 835 
Phe Ala Ser Arg Gly Asn His Val Ser Pro Thr His Tyr Val Pro Glu 
685 690 695 

age gat gca get gee cgc gtc act gee ata etc age age etc act gta 14 883 
Ser Asp Ala Ala Ala Arg Val Thr Ala He Leu Ser Ser Leu Thr Val 
700 705 710 

acc cag etc ctg agg cga ctg cac cag tgg ata age teg gag tgt ace 14 931 
Thr Gin Leu Leu Arg Arg Leu His Gin Trp He Ser Ser Glu Cys Thr 
715 720 725 

act cca tgc tec ggt tec tgg eta agg gac ate tgg gac tgg ata tgc 14 979 
Thr Pro Cys Ser Gly Ser Trp Leu Arg Asp He Trp Asp Trp He Cys 
730 735 740 745 

gag gtg ttg age gac ttt aag acc tgg eta aaa get aag etc atg cca 15027 
Glu Val Leu Ser Asp Phe Lys Thr Trp Leu Lys Ala Lys Leu Met Pro 
750 755 760 

cag ctg cct ggg ate ccc ttt gtg tec tgc cag cgc ggg tat aag ggg 15075 
Gin Leu Pro Gly He Pro Phe Val Ser Cys Gin Arg Gly Tyr Lys Gly 
765 770 775 

gtc tgg cga ggg gac ggc ate atg cac act cgc tgc cac tgt gga get 15123 
Val Trp Arg Gly Asp Gly He Met His Thr Arg Cys His Cys Gly Ala 
780 785 790 

gag ate act gga cat gtc aaa aac ggg acg atg agg ate gtc ggt cct 15171 
Glu He Thr Gly His Val Lys Asn Gly Thr Met Arg He Val Gly Pro 
795 800 805 

agg acc tgc agg aac atg tgg agt ggg acc ttc ccc att aat gec tac 15219 
Arg Thr Cys Arg Asn Met Trp Ser Gly Thr Phe Pro He Asn Ala Tyr 
810 815 820 825 
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acc acg ggc ccc tgt acc ccc ctt cct gcg ccg aac tac acg ttc gcg 
Thr Thr Gly Pro Cys Thr Pro Leu Pro Ala Pro Asn Tyr Thr Phe Ala 
830 835 840 



15267 



eta tgg agg gtg tct gca gag gaa tac gtg gag ata agg cag gtg ggg 
Leu Trp Arg Val Ser Ala Glu Glu Tyr Val Glu He Arg Gin Val Gly 
845 850 855 



15315 



gac ttc cac tac gtg acg ggt atg act act gac aat ctt aaa tgc ccg 
Asp Phe His Tyr Val Thr Gly Met Thr Thr Asp Asn Leu Lys Cys Pro 
860 865 870 



15363 



tgc cag gtc cca teg ccc gaa ttt ttc aca gaa ttg gac ggg gtg cgc 
Cys Gin Val Pro Ser Pro Glu Phe Phe Thr Glu Leu Asp Gly Val Arg 
875 880 885 



15411 



eta cat agg ttt gcg ccc ccc tgc aag ccc ttg ctg egg gag gag gta 
Leu His Arg Phe Ala Pro Pro Cys Lys Pro Leu Leu Arg Glu Glu Val 
890 895 900 905 



15459 



tea ttc aga gta gga etc cac gaa tac ccg gta ggg teg caa tta cct 
Ser Phe Arg Val Gly Leu His Glu Tyr Pro Val Gly Ser Gin Leu Pro 
910 915 920 



15507 



tgc gag ccc gaa ccg gac gtg gee gtg ttg acg tec atg etc act gat 
Cys Glu Pro Glu Pro Asp Val Ala Val Leu Thr Ser Met Leu Thr Asp 
925 930 935 



15555 



ccc tec cat ata aca gca gag gcg gee ggg cga agg ttg gcg agg gga 
Pro Ser His He Thr Ala Glu Ala Ala Gly Arg Arg Leu Ala Arg Gly 
940 945 950 



15603 



tea ccc ccc tct gtg gee age tec teg get age cag eta tec get cca 
Ser Pro Pro Ser Val Ala Ser Ser Ser Ala Ser Gin Leu Ser Ala Pro 
955 960 965 



15651 



tct etc aag gca act tgc acc get aac cat gac tec cct gat get gag 
Ser Leu Lys Ala Thr Cys Thr Ala Asn His Asp Ser Pro Asp Ala Glu 
970 975 980 985 



15699 



etc ata gag gee aac etc eta tgg agg cag gag atg ggc ggc aac ate 
Leu He Glu Ala Asn Leu Leu Trp Arg Gin Glu Met Gly Gly Asn He 
990 995 1000 



15747 



acc agg gtt gag tea gaa aac aaa gtg gtg att ctg gac tec ttc gat 
Thr Arg Val Glu Ser Glu Asn Lys Val Val He Leu Asp Ser Phe Asp 
1005 1010 1015 



15795 



ccg ctt gtg gcg gag gag gac gag egg gag ate tec gta ccc gca gaa 
Pro Leu Val Ala Glu Glu Asp Glu Arg Glu He Ser Val Pro Ala Glu 
1020 1025 1030 



15843 



ate ctg egg aag tct egg aga ttc gee cag gee ctg ccc gtt tgg gcg 
He Leu Arg Lys Ser Arg Arg Phe Ala Gin Ala Leu Pro Val Trp Ala 
1035 1040 1045 



15891 



egg ccg gac tat aac ccc ccg eta gtg gag acg tgg aaa aag ccc gac 
Arg Pro Asp Tyr Asn Pro Pro Leu Val Glu Thr Trp Lys Lys Pro Asp 
1050 1055 1060 1065 



15939 
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tac gaa cca cct gtg gtc cat ggc tgc ccg ctt cca cct cca aag tec 15987 
Tyr Glu Pro Pro Val Val His Gly Cys Pro Leu Pro Pro Pro Lys Ser 
1070 1075 1080 

cct cct gtg cct ccg cct egg aag aag egg acg gtg gtc etc act gaa 16035 
Pro Pro Val Pro Pro Pro Arg Lys Lys Arg Thr Val Val Leu Thr Glu 
1085 1090 1095 

tea acc eta tct act gec ttg gee gag etc gec ace aga age ttt ggc 16083 
Ser Thr Leu Ser Thr Ala Leu Ala Glu Leu Ala Thr Arg Ser Phe Gly 
1100 1105 1110 

age tec tea act tec ggc att acg ggc gac aat acg aca aca tec tct 16131 
Ser Ser Ser Thr Ser Gly lie Thr Gly Asp Asn Thr Thr Thr Ser Ser 
1115 1120 1125 

gag ccc gec cct tct ggc tgc ccc ccc gac tec gac get gag tec tat 16179 
Glu Pro Ala Pro Ser Gly Cys Pro Pro Asp Ser Asp Ala Glu Ser Tyr 
1130 1135 1140 1145 

tec tec atg ccc ccc ctg gag ggg gag cct ggg gat ccg gat ctt age 16227 
Ser Ser Met Pro Pro Leu Glu Gly Glu Pro Gly Asp Pro Asp Leu Ser 
1150 1155 1160 

gac ggg tea tgg tea acg gtc agt agt gag gee aac gcg gag gat gtc 16275 
Asp Gly Ser Trp Ser Thr Val Ser Ser Glu Ala Asn Ala Glu Asp Val 
1165 1170 1175 

gtg tgc tgc tea atg tct tac tct tgg aca ggc gca etc gtc acc ccg 16323 
Val Cys Cys Ser Met Ser Tyr Ser Trp Thr Gly Ala Leu Val Thr Pro 
1180 1185 1190 

tgc gee gcg gaa gaa cag aaa ctg ccc ate aat gca eta age aac teg 16371 
Cys Ala Ala Glu Glu Gin Lys Leu Pro He Asn Ala Leu Ser Asn Ser 
1195 1200 1205 

ttg eta cgt cac cac aat ttg gtg tat tec acc acc tea cgc agt get 16419 
Leu Leu Arg His His Asn Leu Val Tyr Ser Thr Thr Ser Arg Ser Ala 
1210 1215 1220 1225 

tgc caa agg cag aag aaa gtc aca ttt gac aga ctg caa gtt ctg gac 16467 
Cys Gin Arg Gin Lys Lys Val Thr Phe Asp Arg Leu Gin Val Leu Asp 
1230 1235 1240 

age cat tac cag gac gta etc aag gag gtt aaa gca gcg gcg tea aaa 16515 
Ser His Tyr Gin Asp Val Leu Lys Glu Val Lys Ala Ala Ala Ser Lys 
1245 1250 1255 

gtg aag get aac ttg eta tec gta gag gaa get tgc age ctg acg ccc 16563 
Val Lys Ala Asn Leu Leu Ser Val Glu Glu Ala Cys Ser Leu Thr Pro 
1260 1265 1270 

cca cac tea gec aaa tec aag ttt ggt tat ggg gca aaa gac gtc cgt 16611 
Pro His Ser Ala Lys Ser Lys Phe Gly Tyr Gly Ala Lys Asp Val Arg 
1275 1280 1285 

tgc cat gec aga aag gee gta acc cac ate aac tec gtg tgg aaa gac 16659 
Cys His Ala Arg Lys Ala Val Thr His He Asn Ser Val Trp Lys Asp 
1290 1295 1300 " 1305 
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ctt ctg gaa gac aat gta aca cca ata gac act acc ate atg get aag 16707 

Leu Leu Glu Asp Asn Val Thr Pro lie Asp Thr Thr lie Met Ala Lys 
1310 1315 1320 

aac gag gtt ttc tgc gtt cag cct gag aag ggg ggt cgt aag cca get 16755 
Asn Glu Val Phe Cys Val Gin Pro Glu Lys Gly Gly Arg Lys Pro Ala 
1325 1330 " 1335 

cgt etc ate gtg ttc ccc gat ctg ggc gtg cgc gtg tgc gaa aag atg 16803 

Arg Leu lie Val Phe Pro Asp Leu Gly Val Arg Val Cys Glu Lys Met 
1340 1345 1350 

get ttg tac gac gtg gtt aca aag etc ccc ttg gee gtg atg gga age 16851 

Ala Leu Tyr Asp Val Val Thr Lys Leu Pro Leu Ala Val Met Gly Ser 
1355 1360 1365 

tec tac gga ttc caa tac tea cca gga cag egg gtt gaa ttc etc gtg 16899 

Ser Tyr Gly Phe Gin Tyr Ser Pro Gly Gin Arg Val Glu Phe Leu Val 
1370 1375 1380 1385 

caa gcg tgg aag tec aag aaa acc cca atg ggg ttc teg tat gat acc 16947 

Gin Ala Trp Lys Ser Lys Lys Thr Pro Met Gly Phe Ser Tyr Asp Thr 
1390 1395 1400 

cgc tgc ttt gac tec aca gtc act gag age gac ate cgt acg gag gag 16995 

Arg Cys Phe Asp Ser Thr Val Thr Glu Ser Asp lie Arg Thr Glu Glu 
1405 1410 1415 

gca ate tac caa tgt tgt gac etc gac ccc caa gee cgc gtg gee ate 17 043 

Ala lie Tyr Gin Cys Cys Asp Leu Asp Pro Gin Ala Arg Val Ala lie 
1420 1425 1430 

aag tec etc acc gag agg ctt tat gtt ggg ggc cct ctt acc aat tea 17091 

Lys Ser Leu Thr Glu Arg Leu Tyr Val Gly Gly Pro Leu Thr Asn Ser 
1435 1440 1445 

agg ggg gag aac tgc ggc tat cgc agg tgc cgc gcg age ggc gta ctg 17139 

Arg Gly Glu Asn Cys Gly Tyr Arg Arg Cys Arg Ala Ser Gly Val Leu 
1450 1455 1460 1465 

aca act age tgt ggt aac acc etc act tgc tac ate aag gee egg gca 17187 

Thr Thr Ser Cys Gly Asn Thr Leu Thr Cys Tyr He Lys Ala Arg Ala 
1470 1475 1480 

gee tgt cga gee gca ggg etc cag gac tgc acc atg etc gtg tgt ggc 1723 5 

Ala Cys Arg Ala Ala Gly Leu Gin Asp Cys Thr Met Leu Val Cys Gly 
1485 1490 1495 

gac gac tta gtc gtt ate tgt gaa age gcg ggg gtc cag gag gac gcg 17283 

Asp Asp Leu Val Val He Cys Glu Ser Ala Gly Val Gin Glu Asp Ala 
1500 1505 1510 

gcg age ctg aga gec ttc acg gag get atg acc agg tac tec gec ccc 17331 

Ala Ser Leu Arg Ala Phe Thr Glu Ala Met Thr Arg Tyr Ser Ala Pro 
1515 1520 1525 

cct ggg gac ccc cca caa cca gaa tac gac ttg gag etc ata aca tea 17379 

Pro Gly Asp Pro Pro Gin Pro Glu Tyr Asp Leu Glu Leu He Thr Ser 
1530 1535 1540 1545 
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tgc tec tec aac gtg tea gtc gec cac gac ggc get gga aag agg gtc 
Cys Ser Ser Asn Val Ser Val Ala His Asp Gly Ala Gly Lys Arg Val 
1550 1555 1560 



17427 



tac tac etc ace cgt gac cct aca acc ccc etc gcg aga get gcg tgg 
Tyr Tyr Leu Thr Arg Asp Pro Thr Thr Pro Leu Ala Arg Ala Ala Trp 
1565 1570 1575 



17475 



gag aca gca aga cac act cca gtc aat tec tgg eta ggc aac ata ate 
Glu Thr Ala Arg His Thr Pro Val Asn Ser Trp Leu Gly Asn lie lie 
1580 1585 1590 



17523 



atg ttt gee ccc aca ctg tgg gcg agg atg ata ctg atg acc cat ttc 
Met Phe Ala Pro Thr Leu Trp Ala Arg Met lie Leu Met Thr His Phe 
1595 1600 1605 



17571 



ttt age gtc ctt ata gee agg gac cag ctt gaa cag gee etc gat tgc 
Phe Ser Val Leu lie Ala Arg Asp Gin Leu Glu Gin Ala Leu Asp Cys 
1610 1615 1620 1625 



17619 



gag ate tac ggg gee tgc tac tec ata gaa cca ctg gat eta cct cca 
Glu He Tyr Gly Ala Cys Tyr Ser He Glu Pro Leu Asp Leu Pro Pro 
1630 1635 1640 



17667 



ate att caa aga etc cat ggc etc age gca ttt tea etc cac agt tac 
He He Gin Arg Leu His Gly Leu Ser Ala Phe Ser Leu His Ser Tyr 
1645 1650 1655 



17715 



tct cca ggt gaa ate aat agg gtg gec gca tgc etc aga aaa ctt ggg 
Ser Pro Gly Glu He Asn Arg Val Ala Ala Cys Leu Arg Lys Leu Gly 
1660 1665 1670 



17763 



gta ccg ccc ttg cga get tgg aga cac egg gec egg age gtc cgc get 
Val Pro Pro Leu Arg Ala Trp Arg His Arg Ala Arg Ser Val Arg Ala 
1675 1680 1685 



17811 



agg ctt ctg gee aga gga ggc agg get gec ata tgt ggc aag tac etc 
Arg Leu Leu Ala Arg Gly Gly Arg Ala Ala He Cys Gly Lys Tyr Leu 
1690 1695 1700 1705 



17859 



ttc aac tgg gca gta aga aca aag etc aaa etc act cca ata gcg gee 
Phe Asn Trp Ala Val Arg Thr Lys Leu Lys Leu Thr Pro lie Ala Ala 
1710 1715 1720 



17907 



get ggc cag ctg gac ttg tec ggc tgg ttc acg get ggc tac age ggg 
Ala Gly Gin Leu Asp Leu Ser Gly Trp Phe Thr Ala Gly Tyr Ser Gly 
1725 1730 1735 



17955 



gga gac att tat cac age gtg tct cat gec egg ccc cgc tgg ate tgg 
Gly Asp He Tyr His Ser Val Ser His Ala Arg Pro Arg Trp He Trp 
1740 1745 " 1750 



18003 



ttt tgc eta etc ctg ctt get gca ggg gta ggc ate tac etc etc ccc 
Phe Cys Leu Leu Leu Leu Ala Ala Gly Val Gly He Tyr Leu Leu Pro 
1755 1760 1765 



18051 



aac cga tgaaggttgg ggtaaacact ccggcctaaa aaaaaaaaaa aatctagaac 18107 

Asn Arg 

1770 
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ccgagtcgac 


tttgttccca 


ctgtactttt 


agctcgtaca 


aaatacaata 


tacttttcat 


18167 


ttctccgtaa 


acaacatgtt 


ttcccatgta 


atatcctttt 


ctatttttcg 


ttccgttacc 


18227 


aactttacac 


atactttata 


tagctattca 


cttctataca 


ctaaaaaact 


aagacaattt 


18287 


taattttgct 


gcctgccata 


tttcaatttg 


ttataaattc 


ctataattta 


tcctattagt 


18347 


agctaaaaaa 


agatgaatgt 


gaatcgaatc 


ctaagagaat 


tggatctgat 


ccacaggacg 


18407 


ggtgtggtcg 


ccatgatcgc 


gtagtcgata 


gtggctccaa 


gtagcgaagc 


gagcaggact 


18467 


gggcggcggc 


caaagcggtc 


ggacagtgct 


ccgagaacgg 


gtgcgcatag 


aaattgcatc 


18527 


aacgcatata 


gcgctagcag 


cacgccatag 


tgactggcga 


tgctgtcgga 


atggacgata 


18587 


tcccgcaaga 


ggcccggcag 


taccggcata 


accaagccta 


tgcctacagc 


atccagggtg 


18647 


acggtgccga 


ggatgacgat 


gagcgcattg 


ttagatttca 


tacacggtgc 


ctgactgcgt 


18707 


tagcaattta 


actgtgataa 


actaccgcat 


taaagctttt 


tctttccaat 


tttttttttt 


18767 


tcgtcattat 


aaaaatcatt 


acgaccgaga 


ttcccgggta 


ataactgata 


taattaaatt 


18827 


gaagctctaa 


tttgtgagtt 


tagtatacat 


gcatttactt 


ataatacagt 


tttttagttt 


18887 


tgctggccgc 


atcttctcaa 


atatgcttcc 


cagcctgctt 


ttctgtaacg 


ttcaccctct 


18947 


accttagcat 


cccttccctt 


tgcaaatagt 


cctcttccaa 


caataataat 


gtcagatcct 


19007 


gtagagacca 


catcatccac 


ggttctatac 


tgttgaccca 


atgcgtctcc 


cttgtcatct 


19067 


aaacccacac 


cgggtgtcat 


aatcaaccaa 


tcgtaacctt 


catctcttcc 


acccatgtct 


19127 


ctttgagcaa 


taaagccgat 


aacaaaatct 


ttgtcgctct 


tcgcaatgtc 


aacagtaccc 


19187 


ttagtatatt 


ctccagtaga 


tagggagccc 


ttgcatgaca 


attctgctaa 


catcaaaagg 


19247 


cctctaggtt 


cctttgttac 


ttcttctgcc 


gcctgcttca 


aaccgctaac 


aatacctggg 


19307 


cccaccacac 


cgtgtgcatt 


cgtaatgtct 


gcccattctg 


ctattctgta 


tacacccgca 


19367 


gagtactgca 


atttgactgt 


attaccaatg 


tcagcaaatt 


ttctgtcttc 


gaagagtaaa 


19427 


aaattgtact 


tggcggataa 


tgcctttagc 


ggcttaactg 


tgccctccat 


ggaaaaatca 


19487 


gtcaagatat 


ccacatgtgt 


ttttagtaaa 


caaattttgg 


gacctaatgc 


ttcaactaac 


19547 


tccagtaatt 


ccttggtggt 


acgaacatcc 


aatgaagcac 


acaagtttgt 


ttgcttttcg 


19607 


tgcatgatat 


taaatagctt 


ggcagcaaca 


ggactaggat 


gagtagcagc 


acgttcctta 


19667 


tatgtagctt 


tcgacatgat 


ttatcttcgt 


ttcctgcagg 


tttttgttct 


gtgcagttgg 


19727 


gttaagaata 


ctgggcaatt 


tcatgtttct 


tcaacactac 


atatgcgtat 


atataccaat 


19787 


ctaagtctgt 


gctccttcct 


tcgttcttcc 


ttctgttcgg 


agattaccga 


atcaaaaaaa 


19847 


tttcaaggaa 


accgaaatca 


aaaaaaagaa 


taaaaaaaaa 


atgatgaatt 


gaaaagctta 


19907 
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tcgat 



19912 



<210> 9 
<211> 1771 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: pd.deltaNS3NS5 



Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu Asn Pro 
1 5 10 15 

Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys Ala His 
20 25 30 

Gly lie Asp Pro Asn lie Arg Thr Gly Val Arg Thr He Thr Thr Gly 
35 40 45 

Ser Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp Gly Gly 
50 55 60 

Cys Ser Gly Gly Ala Tyr Asp He He He Cys Asp Glu Cys His Ser 
65 70 75 80 

Thr Asp Ala Thr Ser He Leu Gly He Gly Thr Val Leu Asp Gin Ala 
85 90 95 

Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr Pro Pro 
100 105 110 

Gly Ser Val Thr Val Pro His Pro Asn He Glu Glu Val Ala Leu Ser 
115 120 125 

Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala He Pro Leu Glu Val 
130 135 140 

He Lys Gly Gly Arg His Leu He Phe Cys His Ser Lys Lys Lys Cys 
145 150 155 160 

Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly He Asn Ala Val Ala 
165 170 175 

Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro Thr Ser Gly Asp Val 
180 185 190 

Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe 
195 200 205 

Asp Ser Val He Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe 
210 215 220 

Ser Leu Asp Pro Thr Phe Thr He Glu Thr He Thr Leu Pro Gin Asp 
225 230 235 240 

Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly Lys Pro 



<400> 



9 



245 



250 



255 
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Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly Met Phe 
260 265 ~ 270 

Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala Trp Tyr 
275 280 285 

Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr Met Asn 
290 295 300 

Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp Glu Gly 
305 310 315 320 

Val Phe Thr Gly Leu Thr His He Asp Ala His Phe Leu Ser Gin Thr 
325 330 335 

Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin Ala Thr 
340 345 350 

Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin Met Trp 
355 360 365 

Lys Cys Leu He Arg Leu Lys Pro Thr Leu His Gly Pro Thr Pro Leu 
370 375 380 

Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu He Thr Leu Thr His Pro 
385 390 395 400 

Val Thr Lys Tyr He Met Thr Cys Met Ser Ala Asp Leu Glu Val Val 
405 410 415 

Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu Ala Ala 
420 425 430 

Tyr Cys Leu Ser Thr Gly Cys Val Val He Val Gly Arg Val Val Leu 
435 440 445 

Ser Gly Lys Pro Ala lie He Pro Asp Arg Glu Val Leu Tyr Arg Glu 
450 455 460 

Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr He Glu Gin 
465 470 475 480 

Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly Leu Leu 
485 490 495 

Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala Pro Ala Val Gin Thr 
500 505 510 

Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp Asn Phe 
515 520 525 

He Ser Gly He Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro Gly Asn 
530 535 540 

Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr Ser Pro 
545 550 555 560 

Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He Leu Gly Gly Trp Val 
565 570 575 
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Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val Gly Ala 
580 585 590 

Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly Leu Gly Lys Val Leu 
595 600 605 

He Asp He Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala Leu Val 
610 615 620 

Ala Phe Lys He Met Ser Gly Glu Val Pro Ser Thr Glu Asp Leu Val 
625 630 635 640 

Asn Leu Leu Pro Ala lie Leu Ser Pro Gly Ala Leu Val Val Gly Val 
645 650 655 

Val Cys Ala Ala He Leu Arg Arg His Val Gly Pro Gly Glu Gly Ala 
660 665 670 

Val Gin Trp Met Asn Arg Leu He Ala Phe Ala Ser Arg Gly Asn His 
675 680 685 

Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala Arg Val 
690 695 700 

Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg Arg Leu 
705 710 715 720 



His Gin Trp He Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly Ser Trp 
725 730 735 



Leu Arg Asp He Trp Asp Trp lie Cys Glu Val Leu Ser Asp Phe Lys 
740 745 750 

Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly lie Pro Phe 
755 760 765 

Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp Gly lie 
770 775 780 

Met His Thr Arg Cys His Cys Gly Ala Glu lie Thr Gly His Val Lys 
785 790 795 800 

Asn Gly Thr Met Arg He Val Gly Pro Arg Thr Cys Arg Asn Met Trp 
805 810 815 



Ser Gly Thr Phe Pro lie Asn Ala Tyr Thr Thr Gly Pro Cys Thr Pro 
820 825 830 

Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser Ala Glu 
835 840 845 

Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe His Tyr Val Thr Gly 
850 855 860 



Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser Pro Glu 

865 870 875 880 

Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala Pro Pro 

885 890 895 
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Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly Leu His 
900 905 910 

Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro Asp Val 
915 920 925 

Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His lie Thr Ala Glu 
930 935 940 

Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val Ala Ser 
945 " ~ 950 955 960 

Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr Cys Thr 
965 970 975 

Ala Asn His Asp Ser Pro Asp Ala Glu Leu lie Glu Ala Asn Leu Leu 
980 985 990 

Trp Arg Gin Glu Met Gly Gly Asn lie Thr Arg Val Glu Ser Glu Asn 
995 1000 1005 

Lys Val Val lie Leu Asp Ser Phe Asp Pro Leu Val Ala Glu Glu Asp 
1010 1015 1020 

Glu Arg Glu lie Ser Val Pro Ala Glu ile Leu Arg Lys Ser Arg Arg 
025 1030 1035 1040 

Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn Pro Pro 
1045 1050 1055 

Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val Val His 
1060 1065 1070 

Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro Pro Arg 
1075 1080 1085 

Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr Ala Leu 
1090 1095 1100 

Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser Gly Ile 
105 1110 1115 1120 

Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser Gly Cys 
1125 1130 1135 

Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro Leu Glu 
1140 1145 1150 

Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser Thr Val 
1155 1160 1165 

Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met Ser Tyr 
1170 1175 1180 

Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu Gin Lys 
185 1190 1195 1200 

Leu Pro Ile Asn Ala Leu Ser Asn Ser Leu Leu Arg His His Asn Leu 
1205 1210 1215 
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Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys Lys Val 
1220 1225 1230 

Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp Val Leu 
1235 1240 1245 

Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu Leu Ser 
1250 1255 1260 

Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys Ser Lys 
265 1270 1275 1280 

Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys Ala Val 
1285 1290 1295 

Thr His lie Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn Val Thr 
1300 1305 1310 

Pro lie Asp Thr Thr lie Met Ala Lys Asn Glu Val Phe Cys Val Gin 
1315 1320 1325 

Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu lie Val Phe Pro Asp 
1330 1335 1340 

Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val Val Thr 
345 1350 1355 1360 

Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin Tyr Ser 
1365 1370 1375 

Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser Lys Lys 
1380 1385 1390 

. Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val 
1395 1400 1405 

Thr Glu Ser Asp lie Arg Thr Glu Glu Ala lie Tyr Gin Cys Cys Asp 
1410 1415 1420 

Leu Asp Pro Gin Ala Arg Val Ala lie Lys Ser Leu Thr Glu Arg Leu 
425 1430 1435 1440 

Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr 
1445 1450 1455 

Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
1460 1465 1470 

Leu Thr Cys Tyr lie Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
1475 1480 1485 

Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val lie Cys 
1490 1495 1500 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe Thr 
505 1510 1515 " 1520 

Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro Gin Pro 
1525 1530 1535 
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Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser Ser Asn Val Ser Val 
1540 1545 1550 

Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg Asp Pro 
1555 1560 1565 

Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His Thr Pro 
1570 1575 1580 

Val Asn Ser Trp Leu Gly Asn lie lie Met Phe Ala Pro Thr Leu Trp 
585 1590 1595 1600 

Ala Arg Met lie Leu Met Thr His Phe Phe Ser Val Leu He Ala Arg 
1605 1610 1615 

Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu He Tyr Gly Ala Cys Tyr 
1620 1625 1630 

Ser He Glu Pro Leu Asp Leu Pro Pro He lie Gin Arg Leu His Gly 
1635 1640 1645 

Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu He Asn Arg 
1650 1655 1660 

Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg Ala Trp 
665 1670 1675 1680 

Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg Gly Gly 
1685 1690 1695 

Arg Ala Ala He Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val Arg Thr 
1700 1705 1710 

Lys Leu Lys Leu Thr Pro He Ala Ala Ala Gly Gin Leu Asp Leu Ser 
1715 1720 1725 

Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp lie Tyr His Ser Val 
1730 1735 1740 

Ser His Ala Arg Pro Arg Trp He Trp Phe Cys Leu Leu Leu Leu Ala 
745 1750 1755 1760 

Ala Gly Val Gly He Tyr Leu Leu Pro Asn Arg 
1765 1770 



<210> 10 
<211> 19798 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd.deltaNS3NS5.pj 

<220> 
<221> CDS 

<222> (12679) . . (17991) 
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<400> 10 



atcgatccta 


ccccttgcgc 


taaagaagta 


tatgtgccta 


ctaacgcttg 


tctttgtctc 


60 


tgtcactaaa 


cactggatta 


ttactcccag 


atacttattt 


tggactaatt 


taaatgattt 


120 


cggatcaacg 


ttcttaatat 


cgctgaatct 


tccacaattg 


atgaaagtag 


ctaggaagag 


180 


gaattggtat 


aaagtttttg 


tttttgtaaa 


tctcgaagta 


tactcaaacg 


aatttagtat 


240 


tttctcagtg 


atctcccaga 


tgctttcacc 


ctcacttaga 


agtgctttaa 


gcattttttt 


300 


actgtggcta 


tttcccttat 


ctgcttcttc 


cgatgattcg 


aactgtaatt 


gcaaactact 


360 


tacaatatca 


gtgatatcag 


attgatgttt 


ttgtccatag 


taaggaataa 


ttgtaaattc 


420 


ccaagcagga 


atcaatttct 


ttaatgaggc 


ttccagaatt 


gttgcttttt 


gcgtcttgta 


480 


tttaaactgg 


agtgatttat 


tgacaatatc 


gaaactcagc 


gaattgctta 


tgatagtatt 


540 


atagctcatg 


aatgtggctc 


tcttgattgc 


tgttccgtta 


tgtgtaatca 


tccaacataa 


600 


ataggttagt 


tcagcagcac 


ataatgctat 


tttctcacct 


gaaggtcttt 


caaacctttc 


660 


cacaaactga 


cgaacaagca 


ccttaggtgg 


tgttttacat 


aatatatcaa 


attgtggcat 


720 


gcttagcgcc 


gatcttgtgt 


gcaattgata 


tctagtttca 


actactctat 


ttatcttgta 


780 


tcttgcagta 


ttcaaacacg 


ctaactcgaa 


aaactaactt 


taattgtcct 


gtttgtctcg 


840 


cgttctttcg 


aaaaatgcac 


cggccgcgca 


ttatttgtac 


tgcgaaaata 


attggtactg 


900 


cggtatcttc 


atttcatatt 


ttaaaaatgc 


acctttgctg 


cttttcctta 


atttttagac 


960 


ggcccgcagg 


ttcgttttgc 


ggtactatct 


tgtgataaaa 


agttgttttg 


acatgtgatc 


1020 


tgcacagatt 


ttataatgta 


ataagcaaga 


atacattatc 


aaacgaacaa 


tactggtaaa 


1080 


agaaaaccaa 


aatggacgac 


attgaaacag 


ccaagaatct 


gacggtaaaa 


gcacgtacag 


1140 


cttatagcgt 


ctgggatgta 


tgtcggctgt 


ttattgaaat 


gattgctcct 


gatgtagata 


1200 


ttgatataga 


gagtaaacgt 


aagtctgatg 


agctactctt 


tccaggatat 


gtcataaggc 


1260 


ccatggaatc 


tctcacaacc 


ggtaggccgt 


atggtcttga 


ttctagcgca 


gaagattcca 


1320 


gcgtatcttc 


tgactccagt 


gctgaggtaa 


ttttgcctgc 


tgcgaagatg 


gttaaggaaa 


1380 


ggtttgattc 


gattggaaat 


ggtatgctct 


cttcacaaga 


agcaagtcag 


gctgccatag 


1440 


dCL ugaugcc 


acagaacaac 


aagccgt tag 


acaatagaaa 


gcaactatac 


aaacctat tg 


1500 


ctataataat 


aggaagattg 


cccgagaaag 


acaagaagag 


agctaccgaa 


atgctcatga 


1560 


gaaaaatgga 


ttgtacacag 


ttattagtcc 


caccagctcc 


aacggaagaa 


gatgttatga 


1620 


agctcgtaag 


cgtcgttacc 


caattgctta 


ctttagttcc 


accagatcgt 


caagctgctt 


1680 


taataggtga 


tttattcatc 


ccggaatctc 


taaaggatat 


attcaatagt 


ttcaatgaac 


1740 
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tggcggcaga 


gaatcgttta 


cagcaaaaaa 


agagtgagtt 


ggaaggaagg 


actgaagtga 


1800 


accatgctaa 


tacaaatgaa 


gaagttccct 


ccaggcgaac 


aagaagtaga 


gacacaaatg 


1860 


caagaggagc 


atataaatta 


caaaacacca 


tcactgaggg 


ccctaaagcg 


gttcccacga 


1920 


aaaaaaggag 


agtagcaacg 


agggtaaggg 


gcagaaaatc 


acgtaatact 


tctagggtat 


1980 


gatccaatat 


caaaggaaat 


gatagcattg 


aaggatgaga 


ctaatccaat 


tgaggagtgg 


2040 


cagcatatag 


aacagctaaa 


gggtagtgct 


gaaggaagca 


tacgataccc 


cgcatggaat 


2100 


gggataatat 


cacaggaggt 


actagactac 


ctttcatcct 


acataaatag 


acgcatataa 


2160 


gtacgcattt 


aagcataaac 


acgcactatg 


ccgttcttct 


catgtatata 


tatatacagg 


2220 


caacacgcag 


atataggtgc 


gacgtgaaca 


gtgagctgta 


tgtgcgcagc 


tcgcgttgca 


2280 


ttttcggaag 


cgctcgtttt 


cggaaacgct 


ttgaagttcc 


tattccgaag 


ttcctattct 


2340 


ctagaaagta 


taggaacttc 


agagcgcttt 


tgaaaaccaa 


aagcgctctg 


aagacgcact 


2400 


ttcaaaaaac 


caaaaacgca 


ccggactgta 


acgagctact 


aaaatattgc 


gaataccgct 


2460 


tccacaaaca 


ttgctcaaaa 


gtatctcttt 


gctatatatc 


tctgtgctat 


atccctatat 


2520 


aacctaccca 


tccacctttc 


gctccttgaa 


cttgcatcta 


aactcgacct 


ctacatcaac 


2580 


aggcttccaa 


tgctcttcaa 


attttactgt 


caagtagacc 


catacggctg 


taatatgctg 


2640 


ctcttcataa 


tgtaagctta 


tctttatcga 


atcgtgtgaa 


aaactactac 


cgcgataaac 


2700 


ctttacggtt 


ccctgagatt 


gaattagttc 


ctttagtata 


tgatacaaga 


cacttttgaa 


2760 


ctttgtacga 


cgaattttga 


ggttcgccat 


cctctggcta 


tttccaatta 


tcctgtcggc 


2820 


tattatctcc 


gcctcagttt 


gatcttccgc 


ttcagactgc 


catttttcac 


ataatgaatc 


2880 


tatttcaccc 


cacaatcctt 


catccgcctc 


cgcatcttgt 


tccgttaaac 


tattgacttc 


2940 


atgttgtaca 


ttgtttagtt 


cacgagaagg 


gtcctcttca 


ggcggtagct 


cctgatctcc 


3000 


tatatgacct 


ttatcctgtt 


ctctttccac 


aaacttagaa 


atgtattcat 


gaattatgga 


3060 


gcacctaata 


acattcttca 


aggcggagaa 


gtttgggcca 


gatgcccaat 


atgcttgaca 


3120 


tgaaaacgtg 


agaatgaatt 


tagtattatt 


gtgatattct 


gaggcaattt 


tattataatc 


3180 


tcgaagataa 


gagaagaatg 


cagtgacctt 


tgtattgaca 


aatggagatt 


ccatgtatct 


3240 


aaaaaatacg 


cctttaggcc 


ttctgatacc 


ctttcccctg 


cggtttagcg 


tgccttttac 


3300 


attaatatct 


aaaccctctc 


cgatggtggc 


ctttaactga 


ctaataaatg 


caaccgatat 


3360 


aaactgtgat 


aattctgggt 


gatttatgat 


tcgatcgaca 


attgtattgt 


acactagtgc 


3420 


aggatcaggc 


caatccagtt 


ctttttcaat 


taccggtgtg 


tcgtctgtat 


tcagtacatg 


3480 


tccaacaaat 


gcaaatgcta 


acgttttgta 


tttcttataa 


ttgtcaggaa 


ctggaaaagt 


3540 
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cccccttgtc 


gtctcgatta 


cacacctact 


ttcatcgtac 


accataggtt 


ggaagtgctg 


3600 


cataatacat 


tgcttaatac 


aagcaagcag 


tctctcgcca 


ttcatatttc 


agttattttc 


3660 


cattacagct 


gatgtcattg 


tatatcagcg 


ctgtaaaaat 


ctatctgtta 


cagaaggttt 


3720 


tcgcggtttt 


tataaacaaa 


actttcgtta 


cgaaatcgag 


caatcacccc 


agctgcgtat 


3780 


ttggaaattc 


gggaaaaagt 


agagcaacgc 


gagttgcatt 


ttttacacca 


taatgcatga 


3840 


ttaacttcga 


gaagggatta 


aggctaattt 


cactagtatg 


tttcaaaaac 


ctcaatctgt 


3900 


ccattgaatg 


ccttataaaa 


cagctataga 


ttgcatagaa 


gagttagcta 


ctcaatgctt 


3960 


tttgtcaaag 


cttactgatg 


atgatgtgtc 


tactttcagg 


cgggtctgta 


gtaaggagaa 


4020 


tgacattata 


aagctggcac 


ttagaattcc 


acggactata 


gactatacta 


gtatactccg 


4080 


tctactgtac 


gatacacttc 


cgctcaggtc 


cttgtccttt 


aacgaggcct 


taccactctt 


4140 


ttgttactct 


attgatccag 


ctcagcaaag 


gcagtgtgat 


ctaagattct 


atcttcgcga 


4200 


tgtagtaaaa 


ctagctagac 


cgagaaagag 


actagaaatg 


caaaaggcac 


ttctacaatg 


4260 


gctgccatca 


ttattatccg 


atgtgacgct 


gcattttttt 


tttttttttt 


tttttttttt 


4320 


tttttttttt 


tttttttttt 


ttttttggta 


caaatatcat 


aaaaaaagag 


aatcttttta 


4380 


agcaaggatt 


ttcttaactt 


cttcggcgac 


agcatcaccg 


acttcggtgg 


tactgttgga 


4440 


accacctaaa 


tcaccagttc 


tgatacctgc 


atccaaaacc 


tttttaactg 


catcttcaat 


4500 


ggctttacct 


tcttcaggca 


agttcaatga 


caatttcaac 


atcattgcag 


cagacaagat 


4560 


agtggcgata 


gggttgacct 


tattctttgg 


caaatctgga 


gcggaaccat 


ggcatggttc 


4620 


gtacaaacca 


aatgcggtgt 


tcttgtctgg 


caaagaggcc 


aaggacgcag 


atggcaacaa 


4680 


acccaaggag 


cctgggataa 


cggaggcttc 


atcggagatg 


atatcaccaa 


acatgttgct 


4740 


ggtgattata 


ataccattta 


ggtgggttgg 


gttcttaact 


aggatcatgg 


cggcagaatc 


4800 


aatcaattga 


tgttgaactt 


tcaatgtagg 


gaattcgttc 


ttgatggttt 


cctccacagt 


4860 


ttttctccat 


aatcttgaag 


aggccaaaac 


attagcttta 


tccaaggacc 


aaataggcaa 


4920 


tggtggctca 


tgttgtaggg 


ccatgaaagc 


ggccattctt 


gtgattcttt 


gcacttctgg 


4980 


aacggtgtat 


tgttcactat 


cccaagcgac 


accatcacca 


tcgtcttcct 


ttctcttacc 


5040 


aaagtaaata 


cctcccacta 


attctctaac 


aacaacgaag 


tcagtacctt 


tagcaaattg 


5100 


tggcttgatt 


ggagataagt 


ctaaaagaga 


gtcggatgca 


aagttacatg 


gtcttaagtt 


5160 


ggcgtacaat 


tgaagttctt 


tacggatttt 


tagtaaacct 


tgttcaggtc 


taacactacc 


5220 


ggtaccccat 


ttaggaccac 


ccacagcacc 


taacaaaacg 


gcatcagcct 


tcttggaggc 


5280 


ttccagcgcc 


tcatctggaa 


gtggaacacc 


tgtagcatcg 


atagcagcac 


caccaattaa 


5340 
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atgattttcg 


aaatcgaact 


tgacattgga 


acgaacatca 


gaaatagctt 


taagaacctt 


5400 


aatggcttcg 


gctgtgattt 


cttgaccaac 


gtggtcacct 


ggcaaaacga 


cgatcttctt 


5460 


a 999gcagac 


attacaatgg 


tatatccttg 


aaatatatat 


aaaaaaaaaa 


aaaaaaaaaa 


5520 


aaaaaaaaaa 


atgcagcttc 


tcaatgatat 


tcgaatacgc 


tttgaggaga 


tacagcctaa 


5580 


tatccgacaa 


actgttttac 


agatttacga 


tcgtacttgt 


tacccatcat 


tgaattttga 


5640 


acatccgaac 


ctgggagttt 


tccctgaaac 


agatagtata 


tttgaacctg 


tataataata 


5700 


tatagtctag 


cgctttacgg 


aagacaatgt 


atgtatttcg 


gttcctggag 


aaactattgc 


5760 


atctattgca 


taggtaatct 


tgcacgtcgc 


atccccggtt 


cattttctgc 


gtttccatct 


5820 


tgcacttcaa 


tagcatatct 


ttgttaacga 


agcatctgtg 


cttcattttg 


tagaacaaaa 


5880 


atgcaacgcg 


agagcgctaa 


tttttcaaac 


aaagaatctg 


agctgcattt 


ttacagaaca 


5940 


gaaatgcaac 


gcgaaagcgc 


tattttacca 


acgaagaatc 


tgtgcttcat 


ttttgtaaaa 


6000 


caaaaatgca 


acgcgagagc 


gctaattttt 


caaacaaaga 


atctgagctg 


catttttaca 


6060 


gaacagaaat 


gcaacgcgag 


agcgctattt 


taccaacaaa 


gaatctatac 


ttcttttttg 


6120 


ttctacaaaa 


atgcatcccg 


agagcgctat 


ttttctaaca 


aagcatctta 


gattactttt 


6180 


tttctccttt 


gtgcgctcta 


taatgcagtc 


tcttgataac 


tttttgcact 


gtaggtccgt 


6240 


taaggttaga 


agaaggctac 


tttggtgtct 


attttctctt 


ccataaaaaa 


agcctgactc 


6300 


cacttcccgc 


gtttactgat 


tactagcgaa 


gctgcgggtg 


cattttttca 


agataaaggc 


6360 


atccccgatt 


atattctata 


ccgatgtgga 


ttgcgcatac 


tttgtgaaca 


gaaagtgata 


6420 


gcgttgatga 


ttcttcattg 


gtcagaaaat 


tatgaacggt 


ttcttctatt 


ttgtctctat 


6480 


atactacgta 


taggaaatgt 


ttacattttc 


gtattgtttt 


cgattcactc 


tatgaatagt 


6540 


tcttactaca 


atttttttgt 


ctaaagagta 


atactagaga 


taaacataaa 


aaatgtagag 


6600 


gtcgagttta 


gatgcaagtt 


caaggagcga 


aaggtggatg 


ggtaggttat 


atagggatat 


6660 


agcacagaga 


tatatagcaa 


agagatactt 


ttgagcaatg 


tttgtggaag 


cggtattcgc 


6720 


aatattttag 


tagctcgtta 


cagtccggtg 


cgtttttggt 


tttttgaaag 


tgcgtcttca 


6780 


gagcgctttt 


ggttttcaaa 


agcgctctga 


agttcctata 


ctttctagag 


aataggaact 


6840 


tcggaatagg 


aacttcaaag 


cgtttccgaa 


aacgagcgct 


tccgaaaatg 


caacgcgagc 


6900 


tgcgcacata 


cagctcactg 


ttcacgtcgc 


acctatatct 


gcgtgttgcc 


tgtatatata 


6960 


tatacatgag 


aagaacggca 


tagtgcgtgt 


ttatgcttaa 


atgcgtactt 


atatgcgtct 


7020 


atttatgtag 


gatgaaaggt 


agtctagtac 


ctcctgtgat 


attatcccat 


tccatgcggg 


7080 


gtatcgtatg 


cttccttcag 


cactaccctt 


tagctgttct 


atatgctgcc 


actcctcaat 


7140 
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tggattagtc 


tcatccttca 


atgctatcat 


ttcctttgat 


attggatcat 


atgcatagta 


7200 


ccgagaaact 


agtgcgaagt 


agtgatcagg 


tattgctgtt 


atctgatgag 


tatacgttgt 


7260 


cctggccacg 


gcagaagcac 


gcttatcgct 


ccaatttccc 


acaacattag 


tcaactccgt 


7320 


taggcccttc 


attgaaagaa 


atgaggtcat 


caaatgtctt 


ccaatgtgag 


attttgggcc 


7380 


attttttata 


gcaaagattg 


aataaggcgc 


atttttcttc 


aaagctttat 


tgtacgatct 


7440 


gactaagtta 


tcttttaata 


attggtattc 


ctgtttattg 


cttgaagaat 


tgccggtcct 


7500 


atttactcgt 


tttaggactg 


gttcagaatt 


cctcaaaaat 


tcatccaaat 


atacaagtgg 


7560 


atcgatgata 


agctgtcaaa 


catgagaatt 


cttgaagacg 


aaagggcctc 


gtgatacgcc 


7620 


tatttttata 


ggttaatgtc 


atgataataa 


tggtttctta 


gacgtcaggt 


ggcacttttc 


7680 


ggggaaatgt 


gcgcggaacc 


cctatttgtt 


tatttttcta 


aatacattca 


aatatgtatc 


7740 


cgctcatgag 


acaataaccc 


tgataaatgc 


ttcaataata 


ttgaaaaagg 


aagagtatga 


7800 


gtattcaaca 


tttccgtgtc 


gcccttattc 


ccttttttgc 


ggcattttgc 


cttcctgttt 


7860 


ttgctcaccc 


agaaacgctg 


gtgaaagtaa 


aagatgctga 


agatcagttg 


ggtgcacgag 


7920 


tgggttacat 


cgaactggat 


ctcaacagcg 


gtaagatcct 


tgagagtttt 


cgccccgaag 


7980 


aacgttttcc 


aatgatgagc 


acttttaaag 


ttctgctatg 


tggcgcggta 


ttatcccgtg 


8040 


ttgacgccgg 


gcaagagcaa 


ctcggtcgcc 


gcatacacta 


ttctcagaat 


gacttggttg 


8100 


agtactcacc 


agtcacagaa 


aagcatctta 


cggatggcat 


gacagtaaga 


gaattatgca 


8160 


gtgctgccat 


aaccatgagt 


gataacactg 


cggccaactt 


acttctgaca 


acgatcggag 


8220 


gaccgaagga 


gctaaccgct 


tttttgcaca 


acatggggga 


tcatgtaact 


cgccttgatc 


8280 


gttgggaacc 


ggagctgaat 


gaagccatac 


caaacgacga 


gcgtgacacc 


acgatgcctg 


8340 


cagcaatggc 


aacaacgttg 


cgcaaactat 


taactggcga 


actacttact 


ctagcttccc 


8400 


ggcaacaatt 


aatagactgg 


atggaggcgg 


ataaagttgc 


aggaccactt 


ctgcgctcgg 


8460 


cccttccggc 


tggctggttt 


attgctgata 


aatctggagc 


cggtgagcgt 


gggtctcgcg 


8520 


gtatcattgc 


agcactgggg 


ccagatggta 


agccctcccg 


tatcgtagtt 


atctacacga 


8580 


cggggagtca 


ggcaactatg 


gatgaacgaa 


atagacagat 


cgctgagata 


ggtgcctcac 


8640 


tgattaagca 


ttggtaactg 


tcagaccaag 


tttactcata 


tatactttag 


attgatttaa 


8700 


aacttcattt 


ttaatttaaa 


aggatctagg 


tgaagatcct 


ttttgataat 


ctcatgacca 


8760 


aaatccctta 


acgtgagttt 


tcgttccact 


gagcgtcaga 


ccccgtagaa 


aagatcaaag 


8820 


gatcttcttg 


agatcctttt 


tttctgcgcg 


taatctgctg 


cttgcaaaca 


aaaaaaccac 


8880 


cgctaccagc 


ggtggtttgt 


ttgccggatc 


aagagctacc 


aactcttttt 


ccgaaggtaa 


8940 
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ctggcttcag 


cagagcgcag 


ataccaaata 


ctgtccttct 


agtgtagccg 


tagttaggcc 


9000 


accacttcaa 


gaactctgta 


gcaccgccta 


catacctcgc 


tctgctaatc 


ctgttaccag 


9060 


tggctgctgc 


cagtggcgat 


aagtcgtgtc 


ttaccgggtt 


ggactcaaga 


cgatagttac 


9120 


cggataaggc 


gcagcggtcg 


ggctgaacgg 


ggggttcgtg 


cacacagccc 


agcttggagc 


9180 


gaacgaccta 


caccgaactg 


agatacctac 


agcgtgagct 


atgagaaagc 


gccacgcttc 


9240 


ccgaagggag 


aaaggcggac 


aggtatccgg 


taagcggcag 


ggtcggaaca 


ggagagcgca 


9300 


cgagggagct 


tccaggggga 


aacgcctggt 


atctttatag 


tcctgtcggg 


tttcgccacc 


9360 


tctgacttga 


gcgtcgattt 


ttgtgatgct 


cgtcaggggg 


gcggagccta 


tggaaaaacg 


9420 


ccagcaacgc 


ggccttttta 


cggttcctgg 


ccttttgctg 


gccttttgct 


cacatgttct 


9480 


ttcctgcgtt 


atcccctgat 


tctgtggata 


accgtattac 


cgcctttgag 


tgagctgata 


9540 


ccgctcgccg 


cagccgaacg 


accgagcgca 


gcgagtcagt 


gagcgaggaa 


gcggaagagc 


9600 


gcctgatgcg 


gtattttctc 


cttacgcatc 


tgtgcggtat 


ttcacaccgc 


atatggtgca 


9660 


ctctcagtac 


aatctgctct 


gatgccgcat 


agttaagcca 


gtatacactc 


cgctatcgct 


9720 


acgtgactgg 


gtcatggctg 


cgccccgaca 


cccgccaaca 


cccgctgacg 


cgccctgacg 


9780 


ggcttgtctg 


ctcccggcat 


ccgcttacag 


acaagctgtg 


accgtctccg 


ggagctgcat 


9840 


gtgtcagagg 


ttttcaccgt 


catcaccgaa 


acgcgcgagg 


cagctgcggt 


aaagctcatc 


9900 


agcgtggtcg 


tgaagcgatt 


cacagatgtc 


tgcctgttca 


tccgcgtcca 


gctcgttgag 


9960 


tttctccaga 


agcgttaatg 


tctggcttct 


gataaagcgg 


gccatgttaa 


gggcggtttt 


10020 


ttcctgtttg 


gtcactgatg 


cctccgtgta 


agggggattt 


ctgttcatgg 


gggtaatgat 


10080 


accgatgaaa 


cgagagagga 


tgctcacgat 


acgggttact 


gatgatgaac 


atgcccggtt 


10140 


actggaacgt 


tgtgagggta 


aacaactggc 


ggtatggatg 


cggcgggacc 


agagaaaaat 


10200 


cactcagggt 


caatgccagc 


gcttcgttaa 


tacagatgta 


ggtgttccac 


agggtagcca 


10260 


gcagcatcct 


gcgatgcaga 


tccggaacat 


aatggtgcag 


ggcgctgact 


tccgcgtttc 


10320 


cagactttac 


gaaacacgga 


aaccgaagac 


cattcatgtt 


gttgctcagg 


tcgcagacgt 


10380 


tttgcagcag 


cagtcgcttc 


acgttcgctc 


gcgtatcggt 


gattcattct 


gctaaccagt 


10440 


aaggcaaccc 


cgccagccta 


gccgggtcct 


caacgacagg 


agcacgatca 


tgcgcacccg 


10500 


tggccaggac 


ccaacgctgc 


ccgagatgcg 


ccgcgtgcgg 


ctgctggaga 


tggcggacgc 


10560 


gatggatatg 


ttctgccaag 


ggttggtttg 


cgcattcaca 


gttctccgca 


agaattgatt 


10620 


ggctccaatt 


cttggagtgg 


tgaatccgtt 


agcgaggtgc 


cgccggcttc 


cattcaggtc 


10680 


gaggtggccc 


ggctccatgc 


accgcgacgc 


aacgcgggga 


ggcagacaag 


gtatagggcg 


10740 
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gcgcctacaa 


tccatgccaa 


cccgttccat 


gtgctcgccg 


aggcggcata 


aatcgccgtg 


10800 


acgatcagcg 


gtccaatgat 


cgaagttagg 


ctggtaagag 


ccgcgagcga 


tccttgaagc 


10860 


tgtccctgat 


ggtcgtcatc 


tacctgcctg 


gacagcatgg 


cctgcaacgc 


gggcatcccg 


10920 


atgccgccgg 


aagcgagaag 


aatcataatg 


gggaaggcca 


tccagcctcg 


cgtcgcgaac 


10980 


gccagcaaga 


cgtagcccag 


cgcgtcggcc 


gccatgccgg 


cgataatggc 


ctgcttctcg 


11040 


ccgaaacgtt 


tggtggcggg 


accagtgacg 


aaggcttgag 


cgagggcgtg 


caagattccg 


11100 


aataccgcaa 


gcgacaggcc 


gatcatcgtc 


gcgctccagc 


gaaagcggtc 


ctcgccgaaa 


11160 


atgacccaga 


gcgctgccgg 


cacctgtcct 


acgagttgca 


tgataaagaa 


gacagtcata 


11220 


agtgcggcga 


cgatagtcat 


gccccgcgcc 


caccggaagg 


agctgactgg 


gttgaaggct 


11280 


ctcaagggca 


tcggtcgagg 


atccttcaat 


atgcgcacat 


acgctgttat 


gttcaaggtc 


11340 


ccttcgttta 


agaacgaaag 


cggtcttcct 


tttgagggat 


gtttcaagtt 


gttcaaatct 


11400 


atcaaatttg 


caaatcccca 


gtctgtatct 


agagcgttga 


atcggtgatg 


cgatttgtta 


11460 


attaaattga 


tggtgtcacc 


attaccaggt 


ctagatatac 


caatggcaaa 


ctgagcacaa 


11520 


caataccagt 


ccggatcaac 


tggcaccatc 


tctcccgtag 


tctcatctaa 


tttttcttcc 


11580 


ggatgaggtt 


ccagatatac 


cgcaacacct 


ttattatggt 


ttccctgagg 


gaataataga 


11640 


atgtcccatt 


cgaaatcacc 


aattctaaac 


ctgggcgaat 


tgtatttcgg 


gtttgttaac 


11700 


tcgttccagt 


caggaatgtt 


ccacgtgaag 


ctatcttcca 


gcaaagtctc 


cacttcttca 


11760 


tcaaattgtg 


gagaatactc 


ccaatgctct 


tatctatggg 


acttccggga 


aacacagtac 


11820 


cgatacttcc 


caattcgtct 


tcagagctca 


ttgtttgttt 


gaagagacta 


atcaaagaat 


11880 


cgttttctca 


aaaaaattaa 


tatcttaact 


gatagtttga 


tcaaaggggc 


aaaacgtagg 


11940 


ggcaaacaaa 


cggaaaaatc 


gtttctcaaa 


ttttctgatg 


ccaagaactc 


taaccagtct 


12000 


tatctaaaaa 


ttgccttatg 


atccgtctct 


ccggttacag 


cctgtgtaac 


tgattaatcc 


12060 


tgcctttcta 


atcaccattc 


taatgtttta 


attaagggat 


tttgtcttca 


ttaacggctt 


12120 


tcgctcataa 


aaatgttatg 


acgttttgcc 


cgcaggcggg 


aaaccatcca 


cttcacgaga 


12180 


ctgatctcct 


ctgccggaac 


accgggcatc 


tccaacttat 


aagttggaga 


aataagagaa 


12240 


tttcagattg 


agagaatgaa 


aaaaaaaaac 


ccttagttca 


taggtccatt 


ctcttagcgc 


12300 


aactacagag 


aacaggggca 


caaacaggca 


aaaaacgggc 


acaacctcaa 


tggagtgatg 


12360 


caacctgcct 


ggagtaaatg 


atgacacaag 


gcaattgacc 


cacgcatgta 


tctatctcat 


12420 


tttcttacac 


cttctattac 


cttctgctct 


ctctgatttg 


gaaaaagctg 


aaaaaaaagg 


12480 


ttgaaaccag 


ttccctgaaa 


ttattcccct 


acttgactaa 


taagtatata 


aagacggtag 


12540 
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gtattgattg taattctgta aatctatttc ttaaacttct taaattctac ttttatagtt 12600 

agtctttttt ttagttttaa aacaccaaga acttagtttc gaataaacac acataaacaa 12660 

acaagcttac aaaacaaa atg get gca tat gca get cag ggc tat aag gtg 12711 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val 
15 10 



eta gta etc aac ccc tct gtt get gca aca ctg ggc ttt ggt get tac 
Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr 
15 20 25 



12759 



atg tec aag get cat ggg ate gat cct aac ate agg acc ggg gtg aga 
Met Ser Lys Ala His Gly He Asp Pro Asn He Arg Thr Gly Val Arg 
30 35 40 



12807 



aca att acc act ggc age ccc ate acg tac tec acc tac ggc aag ttc 
Thr He Thr Thr Gly Ser Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe 
45 50 55 



12855 



ctt gee gac ggc ggg tgc teg ggg ggc get tat gac ata ata att tgt 
Leu Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp He He He Cys 
60 65 70 75 



12903 



gac gag tgc cac tec acg gat gee aca tec ate ttg ggc att ggc act 
Asp Glu Cys His Ser Thr Asp Ala Thr Ser He Leu Gly He Gly Thr 
80 85 90 



12951 



gtc ctt gac caa gca gag act gcg ggg gcg aga ctg gtt gtg etc gee 
Val Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala 
95 100 105 



12999 



acc gee acc cct ccg ggc tec gtc act gtg ccc cat ccc aac ate gag 
Thr Ala Thr Pro Pro Gly Ser Val Thr Val Pro His Pro Asn He Glu 
110 115 120 



13047 



gag gtt get ctg tec acc acc gga gag ate cct ttt tac ggc aag get 
Glu Val Ala Leu Ser Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala 
125 130 135 



13095 



ate ccc etc gaa gta ate aag ggg ggg aga cat etc ate ttc tgt cat 
He Pro Leu Glu Val He Lys Gly Gly Arg His Leu He Phe Cys His 
140 145 150 155 



13143 



tea aag aag aag tgc gac gaa etc gee gca aag ctg gtc gca ttg ggc 
Ser Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly 
160 165 170 



13191 



ate aat gee gtg gee tac tac cgc ggt ctt gac gtg tec gtc ate ccg 
He Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val lie Pro 
175 180 185 



13239 



acc age ggc gat gtt gtc gtc gtg gca acc gat gee etc atg acc ggc 
Thr Ser Gly Asp Val Val Val Val Ala Thr Asp Ala Leu Met Thr Gly 
190 195 200 



13287 



tat acc ggc gac ttc gac teg gtg ata gac tgc aat acg tgt gtc acc 
Tyr Thr Gly Asp Phe Asp Ser Val He Asp Cys Asn Thr Cys Val Thr 
205 210 * 215 



13335 
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cag aca gtc gat ttc age ctt gac cct acc ttc acc att gag aca ate 13383 
Gin Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr lie Glu Thr lie 
220 225 230 235 

acg etc ccc caa gat get gtc tec cgc act caa cgt egg ggc agg act 13431 
Thr Leu Pro Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr 
240 245 250 

ggc agg ggg aag cca ggc ate tac aga ttt gtg gca ccg ggg gag cgc 1347 9 
Gly Arg Gly Lys Pro Gly lie Tyr Arg Phe Val Ala Pro Gly Glu Arg 
255 260 265 

ccc tec ggc atg ttc gac teg tec gtc etc tgt gag tgc tat gac gca 13527 
Pro Ser Gly Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala 
270 275 280 

ggc tgt get tgg tat gag etc acg ccc gee gag act aca gtt agg eta 13575 
Gly Cys Ala Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu 
285 290 295 

cga gcg tac atg aac acc ccg ggg ctt ccc gtg tgc cag gac cat ctt 13623 
Arg Ala Tyr Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu 
300 305 310 315 

gaa ttt tgg gag ggc gtc ttt aca ggc etc act cat ata gat gee cac 13671 
Glu Phe Trp Glu Gly Val Phe Thr Gly Leu Thr His He Asp Ala His 
320 325 330 

ttt eta tec cag aca aag cag agt ggg gag aac ctt cct tac ctg gta 13719 
Phe Leu Ser Gin Thr Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val 
335 340 345 

gcg tac caa gee acc gtg tgc get agg get caa gee cct ccc cca teg 13767 
Ala Tyr Gin Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser 
350 355 360 

tgg gac cag atg tgg aag tgt ttg att cgc etc aag ccc acc etc cat 13815 
Trp Asp Gin Met Trp Lys Cys Leu He Arg Leu Lys Pro Thr Leu His 
365 370 375 

ggg cca aca ccc ctg eta tac aga ctg ggc get gtt cag aat gaa ate 13863 
Gly Pro Thr Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu He 
380 385 390 395 

acc ctg acg cac cca gtc acc aaa tac ate atg aca tgc atg teg gee 13911 
Thr Leu Thr His Pro Val Thr Lys Tyr He Met Thr Cys Met Ser Ala 
400 405 410 

gac ctg gag gtc gtc acg age acc tgg gtg etc gtt ggc ggc gtc ctg 13959 
Asp Leu Glu Val Val Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu 
415 420 425 

get get ttg gee gcg tat tgc ctg tea aca ggc tgc gtg gtc ata gtg 14007 
Ala Ala Leu Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val He Val 
430 435 440 

ggc agg gtc gtc ttg tec ggg aag ccg gca ate ata cct gac agg gaa 14055 
Gly Arg Val Val Leu Ser Gly Lys Pro Ala He He Pro Asp Arg Glu 
445 450 455 
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gtc etc tac cga gag ttc gat gag atg gaa gag tgc tct cag cac tta 14103 
Val Leu Tyr Arg Glu Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu 
460 465 470 475 

ccg tac ate gag caa ggg atg atg etc gee gag cag ttc aag cag aag 14151 
Pro Tyr lie Glu Gin Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys 
480 485 490 

gec etc ggc etc ctg cag acc gcg tec cgt cag gca gag gtt ate gee 14199 
Ala Leu Gly Leu Leu Gin Thr Ala Ser Arg Gin Ala Glu Val lie Ala 
495 500 505 

cct get gtc cag acc aac tgg caa aaa etc gag acc ttc tgg gcg aag 14247 
Pro Ala Val Gin Thr Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys 
510 515 520 

cat atg tgg aac ttc ate agt ggg ata caa tac ttg gcg ggc ttg tea 14295 
His Met Trp Asn Phe lie Ser Gly lie Gin Tyr Leu Ala Gly Leu Ser 
525 530 535 

acg ctg cct ggt aac ccc gec att get tea ttg atg get ttt aca get 14343 
Thr Leu Pro Gly Asn Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala 
540 545 550 555 

get gtc acc age cca eta acc act age caa acc etc etc ttc aac ata 14391 
Ala Val Thr Ser Pro Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He 
560 565 570 

ttg ggg ggg tgg gtg get gec cag etc gee gee ccc ggt gec get act 14439 
Leu Gly Gly Trp Val Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr 
575 580 585 

gee ttt gtg ggc get ggc tta get ggc gec gec ate ggc agt gtt gga 14487 
Ala Phe Val Gly Ala Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly 
590 595 600 

ctg ggg aag gtc etc ata gac ate ctt gca ggg tat ggc gcg ggc gtg 14535 
Leu Gly Lys Val Leu He Asp He Leu Ala Gly Tyr Gly Ala Gly Val 
605 610 615 

gcg gga get ctt gtg gca ttc aag ate atg age ggt gag gtc ccc tec 14583 
Ala Gly Ala Leu Val Ala Phe Lys lie Met Ser Gly Glu Val Pro Ser 
620 625 630 635 

acg gag gac ctg gtc aat eta ctg ccc gec ate etc teg ccc gga gec 14 631 
Thr Glu Asp Leu Val Asn Leu Leu Pro Ala lie Leu Ser Pro Gly Ala 
640 645 650 

etc gta gtc ggc gtg gtc tgt gca gca ata ctg cgc egg cac gtt ggc 14679 
Leu Val Val Gly Val Val Cys Ala Ala lie Leu Arg Arg His Val Gly 
655 660 665 

ccg ggc gag ggg gca gtg cag tgg atg aac egg ctg ata gec ttc gee 14727 
Pro Gly Glu Gly Ala Val Gin Trp Met Asn Arg Leu He Ala Phe Ala 
670 675 680 

tec egg ggg aac cat gtt tec ccc acg cac tac gtg ccg gag age gat 14775 
Ser Arg Gly Asn His Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp 
685 690 695 
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gca get gec cgc gtc act gec ata etc age age etc act gta ace cag 14 823 
Ala Ala Ala Arg Val Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin 
700 705 710 715 

etc ctg agg cga ctg cac cag tgg ata age teg gag tgt acc act cca 14871 
Leu Leu Arg Arg Leu His Gin Trp He Ser Ser Glu Cys Thr Thr Pro 
720 725 730 

tgc tec ggt tec tgg eta agg gac ate tgg gac tgg ata tgc gag gtg 14 919 
Cys Ser Gly Ser Trp Leu Arg Asp He Trp Asp Trp He Cys Glu Val 
735 740 745 

ttg age gac ttt aag acc tgg eta aaa get aag etc atg cca cag ctg 14967 
Leu Ser Asp Phe Lys Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu 
750 755 760 

cct ggg ate ccc ttt gtg tec tgc cag cgc ggg tat aag ggg gtc tgg 15015 
Pro Gly He Pro Phe Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp 
765 770 775 

cga ggg gac ggc ate atg cac act cgc tgc cac tgt gga get gag ate 15063 
Arg Gly Asp Gly He Met His Thr Arg Cys His Cys Gly Ala Glu lie 
780 785 790 795 

act gga cat gtc aaa aac ggg acg atg agg ate gtc ggt cct agg acc 15111 
Thr Gly His Val Lys Asn Gly Thr Met Arg He Val Gly Pro Arg Thr 
800 805 810 

tgc agg aac atg tgg agt ggg acc ttc ccc att aat gee tac acc acg 15159 
Cys Arg Asn Met Trp Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr 
815 820 825 

ggc ccc tgt acc ccc ctt cct gcg ccg aac tac acg. ttc gcg eta tgg 15207 
Gly Pro Cys Thr Pro Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp 
830 835 840 

a 99 gtg tct gca gag gaa tac gtg gag ata agg cag gtg ggg gac ttc 15255 
Arg Val Ser Ala Glu Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe 
845 850 855 

cac tac gtg acg ggt atg act act gac aat ctt aaa tgc ccg tgc cag 15303 
His Tyr Val Thr Gly Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin 
860 865 870 875 

gtc cca teg ccc gaa ttt ttc aca gaa ttg gac ggg gtg cgc eta cat 15351 
Val Pro Ser Pro Glu Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His 
880 885 890 

agg ttt gcg ccc ccc tgc aag ccc ttg ctg egg gag gag gta tea ttc 15399 
Arg Phe Ala Pro Pro Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe 
895 900 905 

aga gta gga etc cac gaa tac ccg gta ggg teg caa tta cct tgc gag 15447 
Arg Val Gly Leu His Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu 
910 915 920 

ccc gaa ccg gac gtg gee gtg ttg acg tec atg etc act gat ccc tec 15495 
Pro Glu Pro Asp Val Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser 
925 930 935 
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cat ata aca gca gag gcg gcc ggg cga agg ttg gcg agg gga tea ccc 15543 
His lie Thr Ala Glu Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro 
940 945 950 955 

ccc tct gtg gcc age tec teg get age cag eta tec get cca tct etc 15591 
Pro Ser Val Ala Ser Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu 
960 965 970 

aag gca act tgc ace get aac cat gac tec cct gat get gag etc ata 15639 
Lys Ala Thr Cys Thr Ala Asn His Asp Ser Pro Asp Ala Glu Leu lie 
975 980 985 

gag gcc aac etc eta tgg agg cag gag atg ggc ggc aac ate acc agg 15687 
Glu Ala Asn Leu Leu Trp Arg Gin Glu Met Gly Gly Asn lie Thr Arg 
990 995 1000 

gtt gag tea gaa aac aaa gtg gtg att ctg gac tec ttc gat ccg ctt 15735 
Val Glu Ser Glu Asn Lys Val Val lie Leu Asp Ser Phe Asp Pro Leu 
1005 1010 1015 

gtg gcg gag gag gac gag egg gag ate tec gta ccc gca gaa ate ctg 15783 
Val Ala Glu Glu Asp Glu Arg Glu He Ser Val Pro Ala Glu He Leu 
1020 1025 1030 1035 

egg aag tct egg aga ttc gcc cag gcc ctg ccc gtt tgg gcg egg ccg 15831 
Arg Lys Ser Arg Arg Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro 
1040 1045 1050 

gac tat aac ccc ccg eta gtg gag acg tgg aaa aag ccc gac tac gaa 15879 
Asp Tyr Asn Pro Pro Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu 
1055 1060 1065 

cca cct gtg gtc cat ggc tgc ccg ctt cca cct cca aag tec cct cct 15927 
Pro Pro Val Val His Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro 
1070 1075 1080 

gtg cct ccg cct egg aag aag egg acg gtg gtc etc act gaa tea acc 15975 
Val Pro Pro Pro Arg Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr 
1085 1090 1095 

eta tct act gcc ttg gcc gag etc gcc acc aga age ttt ggc age tec 16023 
Leu Ser Thr Ala Leu Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser 
1100 1105 1110 1115 

tea act tec ggc att acg ggc gac aat acg aca aca tec tct gag ccc 16071 
Ser Thr Ser Gly He Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro 
1120 1125 1130 

gcc cct tct ggc tgc ccc ccc gac tec gac get gag tec tat tec tec 16119 
Ala Pro Ser Gly Cys Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser 
1135 1140 1145 

atg ccc ccc ctg gag ggg gag cct ggg gat ccg gat ctt age gac ggg 16167 
Met Pro Pro Leu Glu Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly 
1150 1155 1160 

tea tgg tea acg gtc agt agt gag gcc aac gcg gag gat gtc gtg tgc 16215 
Ser Trp Ser Thr Val Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys 
1165 1170 1175 
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tgc tea atg tct tac tct tgg aca ggc gca etc gtc acc ccg tgc gec 16263 
Cys Ser Met Ser Tyr Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala 
1180 1185 1190 1195 

gcg gaa gaa cag aaa ctg ccc ate aat gca eta age aac teg ttg eta 16311 
Ala Glu Glu Gin Lys Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu 
1200 1205 1210 

cgt cac cac aat ttg gtg tat tec acc acc tea cgc agt get tgc caa 163 59 
Arg His His Asn Leu Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin 
1215 1220 1225 

agg cag aag aaa gtc aca ttt gac aga ctg caa gtt ctg gac age cat 16407 
Arg Gin Lys Lys Val Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His 
1230 1235 1240 

tac cag gac gta etc aag gag gtt aaa gca gcg gcg tea aaa gtg aag 16455 
Tyr Gin Asp Val Leu Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys 
1245 1250 1255 

get aac ttg eta tec gta gag gaa get tgc age ctg acg ccc cca cac 16503 
Ala Asn Leu Leu Ser Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His 
1260 1265 1270 1275 

tea gee aaa tec aag ttt ggt tat ggg gca aaa gac gtc cgt tgc cat 16551 
Ser Ala Lys Ser Lys Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His 
1280 1285 1290 

gee aga aag gee gta acc cac ate aac tec gtg tgg aaa gac ctt ctg 16599 
Ala Arg Lys Ala Val Thr His lie Asn Ser Val Trp Lys Asp Leu Leu 
1295 1300 1305 

gaa gac aat gta aca cca ata gac act acc ate atg get aag aac gag 16647 
Glu Asp Asn Val Thr Pro lie Asp Thr Thr lie Met Ala Lys Asn Glu 
1310 1315 1320 

gtt ttc tgc gtt cag cct gag aag ggg ggt cgt aag cca get cgt etc 16695 
Val Phe Cys Val Gin Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu 
1325 1330 1335 

ate gtg ttc ccc gat ctg ggc gtg cgc gtg tgc gaa aag atg get ttg 16743 
lie Val Phe Pro Asp Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu 
1340 1345 1350 1355 

tac gac gtg gtt aca aag etc ccc ttg gee gtg atg gga age tec tac 16791 
Tyr Asp Val Val Thr Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr 
1360 1365 1370 

gga ttc caa tac tea cca gga cag egg gtt gaa ttc etc gtg caa gcg 16 839 
Gly Phe Gin Tyr Ser Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala 
1375 1380 1385 

tgg aag tec aag aaa acc cca atg ggg ttc teg tat gat acc cgc tgc 16887 
Trp Lys Ser Lys Lys Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys 
1390 1395 1400 

ttt gac tec aca gtc act gag age gac ate cgt acg gag gag gca ate 16935 
Phe Asp Ser Thr Val Thr Glu Ser Asp He Arg Thr Glu Glu Ala He 
1405 1410 1415 
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tac caa tgt tgt gac etc gac ccc caa gec cgc gtg gec ate aag tec 16983 
Tyr Gin Cys Cys Asp Leu Asp Pro Gin Ala Arg Val Ala lie Lys Ser 
1420 1425 1430 1435 

etc ace gag agg ctt tat gtt ggg ggc cct ctt ace aat tea agg ggg 17031 
Leu Thr Glu Arg Leu Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly 
1440 1445 1450 

gag aac tgc ggc tat cgc agg tgc cgc gcg age ggc gta ctg aca act 17079 
Glu Asn Cys Gly Tyr Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr 
1455 1460 1465 

age tgt ggt aac ace etc act tgc tac ate aag gee egg gca gee tgt 17127 
Ser Cys Gly Asn Thr Leu Thr Cys Tyr lie Lys Ala Arg Ala Ala Cys 
1470 1475 1480 

cga gee gca ggg etc cag gac tgc ace atg etc gtg tgt ggc gac gac 17175 
Arg Ala Ala Gly Leu Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp 
1485 1490 1495 

tta gtc gtt ate tgt gaa age gcg ggg gtc cag gag gac gcg gcg age 17223 
Leu Val Val He Cys Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser 
1500 1505 1510 1515 

ctg aga gec ttc acg gag get atg ace agg tac tec gee ccc cct ggg 17271 
Leu Arg Ala Phe Thr Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly 
1520 1525 1530 

gac ccc cca caa cca gaa tac gac ttg gag etc ata aca tea tgc tec 17319 
Asp Pro Pro Gin Pro Glu Tyr Asp Leu Glu Leu He Thr Ser Cys Ser 
1535 1540 1545 

tec aac gtg tea gtc gec cac gac ggc get gga aag agg gtc tac tac 17367 
Ser Asn Val Ser Val Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr 
1550 1555 1560 

etc ace cgt gac cct aca ace ccc etc gcg aga get gcg tgg gag aca 17415 
Leu Thr Arg Asp Pro Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr 
1565 1570 1575 

gca aga cac act cca gtc aat tec tgg eta ggc aac ata ate atg ttt 174 63 
Ala Arg His Thr Pro Val Asn Ser Trp Leu. Gly Asn He He Met Phe 
1580 1585 1590 1595 

gec ccc aca ctg tgg gcg agg atg ata ctg atg ace cat ttc ttt age 17511 
Ala Pro Thr Leu Trp Ala Arg Met He Leu Met Thr His Phe Phe Ser 
1600 1605 1610 

gtc ctt ata gee agg gac cag ctt gaa cag gec etc gat tgc gag ate 17559 
Val Leu He Ala Arg Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu He 
1615 1620 1625 

tac ggg gee tgc tac tec ata gaa cca ctg gat eta cct cca ate att 17607 
Tyr Gly Ala Cys Tyr Ser He Glu Pro Leu Asp Leu Pro Pro He lie 
1630 1635 1640 

caa aga etc cat ggc etc age gca ttt tea etc cac agt tac tct cca 17655 
Gin Arg Leu His Gly Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro 
1645 1650 1655 
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ggt gaa ate aat agg gtg gec gca tgc etc aga aaa ctt ggg gta ccg 17703 
Gly Glu lie Asn Arg Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro 
1660 1665 1670 1675 

ccc ttg cga get tgg aga cac egg gee egg age gtc cgc get agg ctt 17751 
Pro Leu Arg Ala Trp Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu 
1680 1685 1690 

ctg gee aga gga ggc agg get gec ata tgt ggc aag tac etc ttc aac 17799 
Leu Ala Arg Gly Gly Arg Ala Ala lie Cys Gly Lys Tyr Leu Phe Asn 
1695 1700 1705 

tgg gca gta aga aca aag etc aaa etc act cca ata gcg gec get ggc 17847 
Trp Ala Val Arg Thr Lys Leu Lys Leu Thr Pro lie Ala Ala Ala Gly 
1710 1715 1720 

cag ctg gac ttg tec ggc tgg ttc acg get ggc tac age ggg gga gac 17895 
Gin Leu Asp Leu Ser Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp 
1725 1730 1735 

att tat cac age gtg tct cat gee egg ccc cgc tgg ate tgg ttt tgc 17943 
lie Tyr His Ser Val Ser His Ala Arg Pro Arg Trp lie Trp Phe Cys 
1740 1745 1750 1755 

eta etc ctg ctt get gca ggg gta ggc ate tac etc etc ccc aac cga 17991 
Leu Leu Leu Leu Ala Ala Gly Val Gly lie Tyr Leu Leu Pro Asn Arg 
1760 1765 1770 

tgaatagtcg actttgttcc cactgtactt ttagctcgta caaaatacaa tatacttttc 18051 

atttctccgt aaacaacatg ttttcccatg taatatcctt ttctattttt cgttccgtta 18111 

ccaactttac acatacttta tatagctatt cacttctata cactaaaaaa ctaagacaat 18171 

tttaattttg ctgcctgcca tatttcaatt tgttataaat tcctataatt tatcctatta 18231 

gtagctaaaa aaagatgaat gtgaatcgaa tcctaagaga attggatctg atccacagga 18291 

cgggtgtggt cgccatgatc gegtagtega tagtggctcc aagtagcgaa gcgagcagga 18351 

ctgggcggcg gecaaagegg teggacagtg ctccgagaac gggtgcgcat agaaattgea 18411 

teaaegcata tagegctage agcacgccat agtgactggc gatgetgteg gaatggacga 18471 

tatcccgcaa gaggcccggc agtaceggea taaccaagcc tatgectaca gcatccaggg 18531 

tgacggtgcc gaggatgacg atgagegcat tgttagattt catacaeggt gcctgactgc 18591 

gttagcaatt taactgtgat aaactaccgc attaaagctt tttctttcca attttttttt 18651 

tttegtcatt ataaaaatca ttacgaccga gattcceggg taataactga tataattaaa 18711 

ttgaagctct aatttgtgag tttagtatac atgeatttae ttataataca gttttttagt 18771 

tttgctggcc gcatcttctc aaatatgett cccagcctgc ttttctgtaa cgttcaccct 18831 

ctaccttagc atcccttccc tttgeaaata gtcctcttcc aacaataata atgtcagatc 18891 

ctgtagagac cacatcatcc aeggttctat actgttgacc caatgegtet cccttgtcat 18951 
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ctaaacccac 


accgggtgtc 


ataatcaacc 


aatcgtaacc 


ttcatctctt 


ccacccatgt 


19011 


ctctttgagc 


aataaagccg 


ataacaaaat 


ctttgtcgct 


cttcgcaatg 


tcaacagtac 


19071 


ccttagtata 


ttctccagta 


gatagggagc 


ccttgcatga 


caattctgct 


aacatcaaaa 


19131 


ggcctctagg 


ttcctttgtt 


acttcttctg 


ccgcctgctt 


caaaccgcta acaatacctg 


19191 


ggcccaccac 


accgtgtgca 


ttcgtaatgt 


ctgcccattc 


tgctattctg 


tatacacccg 


19251 


cagagtactg 


caatttgact 


gtattaccaa 


tgtcagcaaa 


ttttctgtct 


tcgaagagta 


19311 


aaaaattgta. 


cttggcggat 


aatgccttta 


gcggcttaac 


tgtgccctcc 


atggaaaaat 


19371 


cagtcaagat 


atccacatgt 


gtttttagta 


aacaaatttt 


gggacctaat 


gcttcaacta 


19431 


actccagtaa 


ttccttggtg 


gtacgaacat 


ccaatgaagc 


acacaagttt 


gtttgctttt 


19491 


cgtgcatgat 


attaaatagc 


ttggcagcaa 


caggactagg 


atgagtagca gcacgttcct 


19551 


tatatgtagc 


tttcgacatg 


atttatcttc 


gtttcctgca ggtttttgtt 


ctgtgcagtt 


19611 


gggttaagaa 


tactgggcaa 


tttcatgttt 


cttcaacact 


acatatgcgt 


atatatacca 


19671 


atctaagtct 


gtgctccttc 


cttcgttctt 


ccttctgttc 


ggagattacc 


gaatcaaaaa 


19731 


aatttcaagg 


aaaccgaaat 


caaaaaaaag 


aataaaaaaa 


aaatgatgaa 


ttgaaaagct 


19791 


tatcgat 












19798 



<210> 11 
<211> 1771 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd.deltaNS3NS5.pj 

<400> 11 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu Asn Pro 
15 10 15 

Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys Ala His 
20 25 ~ 30 

Gly He Asp Pro Asn He Arg Thr Gly Val Arg Thr He Thr Thr Gly 
35 40 45 

Ser Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp Gly Gly 
50 55 60 

Cys Ser Gly Gly Ala Tyr Asp He He He Cys Asp Glu Cys His Ser 
65 70 75 80 

Thr Asp Ala Thr Ser He Leu Gly He Gly Thr Val Leu Asp Gin Ala 
85 90 95 
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Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr Pro Pro 
100 105 110 

Gly Ser Val Thr Val Pro His Pro Asn lie Glu Glu Val Ala Leu Ser 
115 120 125 

Thr Thr Gly Glu lie Pro Phe Tyr Gly Lys Ala He Pro Leu Glu Val 
130 135 140 

He Lys Gly Gly Arg His Leu He Phe Cys His Ser Lys Lys Lys Cys 
145 150 155 160 

Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly lie Asn Ala Val Ala 
165 170 175 

Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro Thr Ser Gly Asp Val 
180 185 190 

Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe 
195 200 205 

Asp Ser Val He Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe 
210 215 220 

Ser Leu Asp Pro Thr Phe Thr He Glu Thr He Thr Leu Pro Gin Asp 
225 230 235 240 

Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly Lys Pro 
245 250 255 

Gly lie Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly Met Phe 
260 265 270 

Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala Trp Tyr 
275 280 285 

Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr Met Asn 
290 295 300 

Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp Glu Gly 
305 310 315 320 



Val Phe Thr Gly Leu Thr His He Asp Ala His Phe Leu Ser Gin Thr 
325 330 335 

Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin Ala Thr 
340 345 350 

Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin Met Trp 
355 360 365 

Lys Cys Leu He Arg Leu Lys Pro Thr Leu His Gly Pro Thr Pro Leu 
370 375 380 

Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu He Thr Leu Thr His Pro 
385 390 395 400 



Val Thr Lys Tyr He Met Thr Cys Met Ser Ala Asp Leu Glu Val Val 
405 410 415 



81 



WO 01/38360 



PCT/US00/32326 



Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu Ala Ala 
420 425 430 

Tyr Cys Leu Ser Thr Gly Cys Val Val He Val Gly Arg Val Val Leu 
435 440 445 

Ser Gly Lys Pro Ala He He Pro Asp Arg Glu Val Leu Tyr Arg Glu 
450 455 " 460 

Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr He Glu Gin 
465 470 475 480 

Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly Leu Leu 
485 490 495 

Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala Pro Ala Val Gin Thr 
500 505 510 

Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp Asn Phe 
515 520 525 

He Ser Gly He Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro Gly Asn 
530 535 540 

Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr Ser Pro 
545 550 555 560 

Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He Leu Gly Gly Trp Val 
565 570 575 

Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val Gly Ala 
580 585 590 

Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly Leu Gly Lys Val Leu 
595 600 605 

He Asp lie Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala Leu Val 
610 615 620 

Ala Phe Lys lie Met Ser Gly Glu Val Pro Ser Thr Glu Asp Leu Val 
625 630 635 640 

Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala Leu Val Val Gly Val 
645 650 655 

Val Cys Ala Ala lie Leu Arg Arg His Val Gly Pro Gly Glu Gly Ala 
660 665 670 

Val Gin Trp Met Asn Arg Leu He Ala Phe Ala Ser Arg Gly Asn His 
675 680 685 

Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala Arg Val 
690 695 700 

Thr Ala lie Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg Arg Leu 
705 710 715 720 



His Gin Trp He Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly Ser Trp 
725 730 735 
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Leu Arg Asp lie Trp Asp Trp lie Cys Glu Val Leu Ser Asp Phe Lys 
740 745 750 

Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly lie Pro Phe 
755 760 765 

Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp Gly lie 
770 775 780 

Met His Thr Arg Cys His Cys Gly Ala Glu lie Thr Gly His Val Lys 
785 790 795 800 

Asn Gly Thr Met Arg lie Val Gly Pro Arg Thr Cys Arg Asn Met Trp 
805 810 " 815 

Ser Gly Thr Phe Pro lie Asn Ala Tyr Thr Thr Gly Pro Cys Thr Pro 
820 825 830 

Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser Ala Glu 
835 840 845 

Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe His Tyr Val Thr Gly 
850 855 860 

Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser Pro Glu 
865 870 875 880 

Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala Pro Pro 
885 890 895 

Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly Leu His 
900 905 910 

Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro Asp Val 
915 920 925 

Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His He Thr Ala Glu 
930 935 940 

Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val Ala Ser 
945 950 955 960 

Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr Cys Thr 
965 970 975 

Ala Asn His Asp Ser Pro Asp Ala Glu Leu He Glu Ala Asn Leu Leu 
980 985 990 

Trp Arg Gin Glu Met Gly Gly Asn He Thr Arg Val Glu Ser Glu Asn 
995 1000 1005 

Lys Val Val He Leu Asp Ser Phe Asp Pro Leu Val Ala Glu Glu Asp 
1010 1015 1020 

Glu Arg Glu He Ser Val Pro Ala Glu He Leu Arg Lys Ser Arg Arg 
025 1030 1035 1040 

Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn Pro Pro 
1045 1050 1055 
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Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val Val His 
1060 1065 1070 

Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro Pro Arg 
1075 1080 1085 

Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr Ala Leu 
1090 1095 1100 

Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser Gly lie 
105 1110 1115 1120 

Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser Gly Cys 
1125 1130 1135 

Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro Leu Glu 
1140 1145 1150 

Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser Thr Val 
1155 1160 1165 

Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met Ser Tyr 
1170 1175 * 1180 

Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu Gin Lys 
185 1190 1195 1200 

Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu Arg His His Asn Leu 
1205 1210 1215 

Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys Lys Val 
1220 1225 1230 

Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp Val Leu 
1235 1240 " 1245 

Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu Leu Ser 
1250 1255 1260 

Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys Ser Lys 
265 1270 1275 1280 

Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys Ala Val 
1285 1290 1295 

Thr His lie Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn Val Thr 
1300 1305 1310 

Pro He Asp Thr Thr He Met Ala Lys Asn Glu Val Phe Cys Val Gin 
1315 1320 1325 

Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu He Val Phe Pro Asp 
1330 1335 1340 

Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val Val Thr 
345 1350 1355 1360 

Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin Tyr Ser 
1365 1370 1375 
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Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser Lys Lys 
1380 1385 1390 

Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val 
1395 1400 1405 

Thr Glu Ser Asp lie Arg Thr Glu Glu Ala lie Tyr Gin Cys Cys Asp 
1410 1415 1420 

Leu Asp Pro Gin Ala Arg Val Ala lie Lys Ser Leu Thr Glu Arg Leu 
425 1430 1435 1440 

Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr 
1445 1450 1455 

Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
1460 1465 1470 

Leu Thr Cys Tyr lie Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
1475 1480 1485 

Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val lie Cys 
1490 1495 1500 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe Thr 
505 1510 1515 1520 

Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro Gin Pro 
1525 1530 1535 

Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser Ser Asn Val Ser Val 
1540 1545 1550 

Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg Asp Pro 
1555 1560 1565 

Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His Thr Pro 
1570 1575 1580 

Val Asn Ser Trp Leu Gly Asn lie lie Met Phe Ala Pro Thr Leu Trp 
585 1590 1595 1600 

Ala Arg Met lie Leu Met Thr His Phe Phe Ser Val Leu lie Ala Arg 
1605 1610 1615 

Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu He Tyr Gly Ala Cys Tyr 
1620 1625 1630 

Ser He Glu Pro Leu Asp Leu Pro Pro He He Gin Arg Leu His Gly 
1635 1640 1645 

Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu He Asn Arg 
1650 1655 1660 

Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg Ala Trp 
665 1670 1675 1680 

Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg Gly Gly 
1685 1690 1695 
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Arg Ala Ala lie Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val Arg Thr 
1700 1705 1710 

Lys Leu Lys Leu Thr Pro lie Ala Ala Ala Gly Gin Leu Asp Leu Ser 
1715 1720 1725 

Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp lie Tyr His Ser Val 
1730 1735 1740 

Ser His Ala Arg Pro Arg Trp lie Trp Phe Cys Leu Leu Leu Leu Ala 
745 1750 1755 1760 

Ala Gly Val Gly lie Tyr Leu Leu Pro Asn Arg 
1765 1770 



<210> 12 
<211> 20220 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd. delta. NS3NS5.pj .corel21 

<220> 
<221> CDS 

<222> (12679) . . (18354) 



<400> 12 
atcgatccta 


ccccttgcgc 


taaagaagta 


tatgtgccta 


ctaacgcttg 


tctttgtctc 


60 


tgtcactaaa 


cactggatta 


ttactcccag 


atacttattt 


tggactaatt 


taaatgattt 


120 


cggatcaacg 


ttcttaatat 


cgctgaatct 


tccacaattg 


atgaaagtag 


ctaggaagag 


180 


gaattggtat 


aaagtttttg 


tttttgtaaa 


tctcgaagta 


tactcaaacg 


aatttagtat 


240 


tttctcagtg 


atctcccaga 


tgctttcacc 


ctcacttaga 


agtgctttaa 


gcattttttt 


300 


actgtggcta 


tttcccttat 


ctgcttcttc 


cgatgattcg 


aactgtaatt 


gcaaactact 


360 


tacaatatca 


gtgatatcag 


attgatgttt 


ttgtccatag 


taaggaataa 


ttgtaaattc 


420 


ccaagcagga 


atcaatttct 


ttaatgaggc 


ttccagaatt 


gttgcttttt 


gcgtcttgta 


480 


tttaaactgg 


agtgatttat 


tgacaatatc 


gaaactcagc 


gaattgctta 


tgatagtatt 


540 


atagctcatg 


aatgtggctc 


tcttgattgc 


tgttccgtta 


tgtgtaatca 


tccaacataa 


600 


ataggttagt 


tcagcagcac 


ataatgctat 


tttctcacct 


gaaggtcttt 


caaacctttc 


660 


cacaaactga 


cgaacaagca 


ccttaggtgg 


tgttttacat 


aatatatcaa 


attgtggcat 


720 


gcttagcgcc 


gatcttgtgt 


gcaattgata 


tctagtttca 


actactctat 


ttatcttgta 


780 


tcttgcagta 


ttcaaacacg 


ctaactcgaa 


aaactaactt 


taattgtcct 


gtttgtctcg 


840 
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cgttctttcg 


aaaaatgcac 


cggccgcgca 


ttatttgtac 


tgcgaaaata 


attggtactg 


900 


cggtatcttc 


atttcatatt 


ttaaaaatgc 


acctttgctg 


cttttcctta 


atttttagac 


960 


ggcccgcagg 


ttcgttttgc 


ggtactatct 


tgtgataaaa 


agttgttttg 


acatgtgatc 


1020 


tgcacagatt 


ttataatgta 


ataagcaaga 


atacattatc 


aaacgaacaa 


tactggtaaa 


1080 


agaaaaccaa 


aatggacgac 


attgaaacag 


ccaagaatct 


gacggtaaaa 


gcacgtacag 


1140 


cttatagcgt 


ctgggatgta 


tgtcggctgt 


ttattgaaat 


gattgctcct 


gatgtagata 


1200 


ttgatataga 


gagtaaacgt 


aagtctgatg 


agctactctt 


tccaggatat 


gtcataaggc 


1260 


ccatggaatc 


tctcacaacc 


ggtaggccgt 


atggtcttga 


ttctagcgca 


gaagattcca 


1320 


gcgtatcttc 


tgactccagt 


gctgaggtaa 


ttttgcctgc 


tgcgaagatg 


gttaaggaaa 


1380 


ggtttgattc 


gattggaaat 


ggtatgctct 


cttcacaaga 


agcaagtcag 


gctgccatag 


1440 


atttgatgct 


acagaataac 


aagctgttag 


acaatagaaa 


gcaactatac 


aaatctattg 


1500 


ctataataat 


aggaagattg 


cccgagaaag 


acaagaagag 


agctaccgaa 


atgctcatga 


1560 


gaaaaatgga 


ttgtacacag 


ttattagtcc 


caccagctcc 


aacggaagaa 


gatgttatga 


1620 


agctcgtaag 


cgtcgttacc 


caattgctta 


ctttagttcc 


accagatcgt 


caagctgctt 


1680 


taataggtga 


tttattcatc 


ccggaatctc 


taaaggatat 


attcaatagt 


ttcaatgaac 


1740 


tggcggcaga 


gaatcgttta 


cagcaaaaaa 


agagtgagtt 


ggaaggaagg 


actgaagtga 


1800 


accatgctaa 


tacaaatgaa 


gaagttccct 


ccaggcgaac 


aagaagtaga 


gacacaaatg 


1860 


caagaggagc 


atataaatta 


caaaacacca 


tcactgaggg 


ccctaaagcg 


gttcccacga 


1920 


aaaaaaggag 


agtagcaacg 


agggtaaggg 


gcagaaaatc 


acgtaatact 


tctagggtat 


1980 


gatccaatat 


caaaggaaat 


gatagcattg 


aaggatgaga 


ctaatccaat 


tgaggagtgg 


2040 


cagcatatag 


aacagctaaa 


gggtagtgct 


gaaggaagca 


tacgataccc 


cgcatggaat 


2100 


gggataatat 


cacaggaggt 


actagactac 


ctttcatcct 


acataaatag 


acgcatataa 


2160 


gtacgcattt 


aagcataaac 


acgcactatg 


ccgttcttct 


catgtatata 


tatatacagg 


2220 


caacacgcag 


atataggtgc 


gacgtgaaca 


gtgagctgta 


tgtgcgcagc 


tcgcgttgca 


2280 


ttttcggaag 


cgctcgtttt 


cggaaacgct 


ttgaagttcc 


tattccgaag 


ttcctattct 


2340 


ctagaaagta 


taggaacttc 


agagcgcttt 


tgaaaaccaa 


aagcgctctg 


aagacgcact 


2400 


ttcaaaaaac 


caaaaacgca 


ccggactgta 


acgagctact 


aaaatattgc 


gaataccgct 


2460 


tccacaaaca 


ttgctcaaaa 


gtatctcttt 


gctatatatc 


tctgtgctat 


atccctatat 


2520 


aacctaccca 


tccacctttc 


gctccttgaa 


cttgcatcta 


aactcgacct 


ctacatcaac 


2580 


aggcttccaa 


tgctcttcaa 


attttactgt 


caagtagacc 


catacggctg 


taatatgctg 


2640 
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ctcttcataa 


tgtaagctta 


tctttatcga 


atcgtgtgaa 


aaactactac 


cgcgataaac 


2700 


ctttacggtt 


ccctgagatt 


gaattagttc 


ctttagtata 


tgatacaaga 


cacttttgaa 


2760 


ctttgtacga 


cgaattttga 


ggttcgccat 


cctctggcta 


tttccaatta 


tcctgtcggc 


2820 


tattatctcc 


gcctcagttt 


gatcttccgc 


ttcagactgc 


catttttcac 


ataatgaatc 


2880 


tatttcaccc 


cacaatcctt 


catccgcctc 


cgcatcttgt 


tccgttaaac 


tattgacttc 


2940 


atgttgtaca ttgtttagtt 


cacgagaagg 


gtcctcttca 


ggcggtagct 


cctgatctcc 


3000 


tatatgacct 


ttatcctgtt 


ctctttccac 


aaacttagaa 


atgtattcat 


gaattatgga 


3060 


gcacctaata 


acattcttca 


aggcggagaa 


gtttgggcca 


gatgcccaat 


atgcttgaca 


3120 


tgaaaacgtg 


agaatgaatt 


tagtattatt 


gtgatattct 


gaggcaattt 


tattataatc 


3180 


tcgaagataa 


gagaagaatg 


cagtgacctt 


tgtattgaca 


aatggagatt 


ccatgtatct 


3240 


aaaaaatacg 


cctttaggcc 


ttctgatacc 


ctttcccctg 


cggtttagcg 


tgccttttac 


3300 


attaatatct 


aaaccctctc 


cgatggtggc 


ctttaactga 


ctaataaatg 


caaccgatat 


3360 


aaactgtgat 


aattctgggt 


gatttatgat 


tcgatcgaca 


attgtattgt 


acactagtgc 


3420 


aggatcaggc 


caatccagtt 


ctttttcaat 


taccggtgtg 


tcgtctgtat 


tcagtacatg 


3480 


tccaacaaat 


gcaaatgcta 


acgttttgta 


tttcttataa 


ttgtcaggaa 


ctggaaaagt 


3540 


cccccttgtc 


gtctcgatta 


cacacctact 


ttcatcgtac 


accataggtt 


ggaagtgctg 


3600 


cataatacat 


tgcttaatac 


aagcaagcag 


tctctcgcca 


ttcatatttc 


agttattttc 


3660 


cattacagct 


gatgtcattg 


tatatcagcg 


ctgtaaaaat 


ctatctgtta 


cagaaggttt 


3720 


tcgcggtttt 


tataaacaaa 


actttcgtta 


cgaaatcgag 


caatcacccc 


agctgcgtat 


3780 


ttggaaattc 


gggaaaaagt 


agagcaacgc 


gagttgcatt 


ttttacacca 


taatgcatga 


3840 


ttaacttcga 


gaagggatta 


aggctaattt 


cactagtatg 


tttcaaaaac 


ctcaatctgt 


3900 


ccattgaatg 


ccttataaaa 


cagctataga 


ttgcatagaa 


gagttagcta 


ctcaatgctt 


3960 


tttgtcaaag 


cttactgatg 


atgatgtgtc 


tactttcagg 


cgggtctgta 


gtaaggagaa 


4020 


tgacattata 


aagctggcac 


ttagaattcc 


acggactata 


gactatacta 


gtatactccg 


4080 


tctactgtac 


gatacacttc 


cgctcaggtc 


cttgtccttt 


aacgaggcct 


taccactctt 


4140 


ttgttactct 


attgatccag 


ctcagcaaag 


gcagtgtgat 


ctaagattct 


atcttcgcga 


4200 


tgtagtaaaa 


ctagctagac 


cgagaaagag 


actagaaatg 


caaaaggcac 


ttctacaatg 


4260 


gctgccatca 


ttattatccg 


atgtgacgct 


gcattttttt 


tttttttttt 


tttttttttt 


4320 


tttttttttt 


tttttttttt 


ttttttggta 


caaatatcat 


aaaaaaagag 


aatcttttta 


4380 


agcaaggatt 


ttcttaactt 


cttcggcgac 


agcatcaccg 


acttcggtgg 


tactgttgga 


4440 
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accacctaaa 


tcaccagttc 


tgatacctgc 


atccaaaacc 


tttttaactg 


catcttcaat 


4500 


ggctttacct 


tcttcaggca 


agttcaatga 


caatttcaac 


atcattgcag 


cagacaagat 


4560 


agtggcgata 


gggttgacct 


tattctttgg 


caaatctgga 


gcggaaccat 


ggcatggttc 


4620 


gtacaaacca 


aatgcggtgt 


tcttgtctgg 


caaagaggcc 


aaggacgcag 


atggcaacaa 


4680 


acccaaggag 


cctgggataa 


cggaggcttc 


atcggagatg 


atatcaccaa 


acatgttgct 


4740 


ggtgattata 


ataccattta 


ggtgggttgg 


gttcttaact 


aggatcatgg 


cggcagaatc 


4800 


aatcaattga 


tgttgaactt 


tcaatgtagg 


gaattcgttc 


ttgatggttt 


cctccacagt 


4660 


ttttctccat 


aatcttgaag 


aggccaaaac 


attagcttta 


tccaaggacc 


aaataggcaa 


4920 


tggtggctca 


tgttgtaggg 


ccatgaaagc 


ggccattctt 


gtgattcttt 


gcacttctgg 


4980 


aacggtgtat 


tgttcactat 


cccaagcgac 


accatcacca 


tcgtcttcct 


ttctcttacc 


5040 


aaagtaaata 


cctcccacta 


attctctaac 


aacaacgaag 


tcagtacctt 


tagcaaattg 


5100 


tggcttgatt 


ggagataagt 


ctaaaagaga 


gtcggatgca 


aagttacatg 


gtcttaagtt 


5160 


ggcgtacaat 


tgaagttctt 


tacggatttt 


tagtaaacct 


tgttcaggtc 


taacactacc 


5220 


ggtaccccat 


ttaggaccac 


ccacagcacc 


taacaaaacg 


gcatcagcct 


tcttggaggc 


5280 


ttccagcgcc 


tcatctggaa 


gtggaacacc 


tgtagcatcg 


atagcagcac 


caccaattaa 


5340 


atgattttcg 


aaatcgaact 


tgacattgga 


acgaacatca 


gaaatagctt 


taagaacctt 


5400 


aatggcttcg 


gctgtgattt 


cttgaccaac 


gtggtcacct 


ggcaaaacga 


cgatcttctt 


5460 


aggggcagac 


attacaatgg 


tatatccttg 


aaatatatat 


aaaaaaaaaa 


aaaaaaaaaa 


5520 


aaaaaaaaaa 


atgcagcttc 


tcaatgatat 


tcgaatacgc 


tttgaggaga 


tacagcctaa 


5580 


tatccgacaa 


actgttttac 


agatttacga 


tcgtacttgt 


tacccatcat 


tgaattttga 


5640 


acatccgaac 


ctgggagttt 


tccctgaaac 


agatagtata 


tttgaacctg 


tataataata 


5700 


tatagtctag 


cgctttacgg 


aagacaatgt 


atgtatttcg 


gttcctggag 


aaactattgc 


5760 


atctattgca 


taggtaatct 


tgcacgtcgc 


atccccggtt 


cattttctgc 


gtttccatct 


5820 


tgcacttcaa 


tagcatatct 


ttgttaacga 


agcatctgtg 


cttcattttg 


tagaacaaaa 


5880 


atgcaacgcg 


agagcgctaa 


tttttcaaac 


aaagaatctg 


agctgcattt 


ttacagaaca 


5940 


gaaatgcaac 


gcgaaagcgc 


tattttacca 


acgaagaatc 


tgtgcttcat 


ttttgtaaaa 


6000 


caaaaatgca 


acgcgagagc 


gctaattttt 


caaacaaaga 


atctgagctg 


catttttaca 


6060 


gaacagaaat 


gcaacgcgag 


agcgctattt 


taccaacaaa 


gaatctatac 


ttcttttttg 


6120 


ttctacaaaa 


atgcatcccg 


agagcgctat 


ttttctaaca 


aagcatctta 


gattactttt 


6180 


tttctccttt 


gtgcgctcta 


taatgcagtc 


tcttgataac 


tttttgcact 


gtaggtccgt 


6240 
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taaggttaga 


agaaggctac 


tttggtgtct 


attttctctt 


ccataaaaaa 


agcctgactc 


6300 


cacttcccgc 


gtttactgat 


tactagcgaa 


gctgcgggtg 


cattttttca 


agataaaggc 


6360 


atccccgatt 


atattctata 


ccgatgtgga 


ttgcgcatac 


tttgtgaaca 


gaaagtgata 


6420 


gcgttgatga 


ttcttcattg 


gtcagaaaat 


tatgaacggt 


ttcttctatt 


ttgtctctat 


6480 


atactacgta 


taggaaatgt 


ttacattttc 


gtattgtttt 


cgattcactc 


tatgaatagt 


6540 


tcttactaca 


atttttttgt 


ctaaagagta 


atactagaga 


taaacataaa 


aaatgtagag 


6600 


gtcgagttta 


gatgcaagtt 


caaggagcga 


aaggtggatg 


ggtaggttat 


atagggatat 


6660 


agcacagaga 


tatatagcaa 


agagatactt 


ttgagcaatg 


tttgtggaag 


cggtattcgc 


6720 


aatattttag 


tagctcgtta 


cagtccggtg 


cgtttttggt 


tttttgaaag 


tgcgtcttca 


6780 


gagcgctttt 


ggttttcaaa 


agcgctctga 


agttcctata 


ctttctagag 


aataggaact 


6840 


tcggaatagg 


aacttcaaag 


cgtttccgaa 


aacgagcgct 


tccgaaaatg 


caacgcgagc 


6900 


tgcgcacata 


cagctcactg 


ttcacgtcgc 


acctatatct 


gcgtgttgcc 


tgtatatata 


6960 


tatacatgag 


aagaacggca 


tagtgcgtgt 


ttatgcttaa 


atgcgtactt 


atatgcgtct 


7020 


atttatgtag 


gatgaaaggt 


agtctagtac 


ctcctgtgat 


attatcccat 


tccatgcggg 


7080 


gtatcgtatg 


cttccttcag 


cactaccctt 


tagctgttct 


atatgctgcc 


actcctcaat 


7140 


tggattagtc 


tcatccttca 


atgctatcat 


ttcctttgat 


attggatcat 


atgcatagta 


7200 


ccgagaaact 


agtgcgaagt 


agtgatcagg 


tattgctgtt 


atctgatgag 


tatacgttgt 


7260 


cctggccacg 


gcagaagcac 


gcttatcgct 


ccaatttccc 


acaacattag 


tcaactccgt 


7320 


taggcccttc 


attgaaagaa 


atgaggtcat 


caaatgtctt 


ccaatgtgag 


attttgggcc 


7380 


attttttata 


gcaaagattg 


aataaggcgc 


atttttcttc 


aaagctttat 


tgtacgatct 


7440 


gactaagtta 


tcttttaata 


attggtattc 


ctgtttattg 


cttgaagaat 


tgccggtcct 


7500 


atttactcgt 


tttaggactg 


gttcagaatt 


cctcaaaaat 


tcatccaaat 


atacaagtgg 


7560 


atcgatgata 


agctgtcaaa 


catgagaatt 


cttgaagacg 


aaagggcctc 


gtgatacgcc 


7620 


tatttttata 


ggttaatgtc 


atgataataa 


tggtttctta 


gacgtcaggt 


ggcacttttc 


7680 


ggggaaatgt 


gcgcggaacc 


cctatttgtt 


tatttttcta 


aatacattca 


aatatgtatc 


7740 


cgctcatgag 


acaataaccc 


tgataaatgc 


ttcaataata 


ttgaaaaagg 


aagagtatga 


7800 


gtattcaaca 


tttccgtgtc 


gcccttattc 


ccttttttgc 


ggcattttgc 


cttcctgttt 


7860 


ttgctcaccc 


agaaacgctg 


gtgaaagtaa 


aagatgctga 


agatcagttg 


ggtgcacgag 


7920 


tgggttacat 


cgaactggat 


ctcaacagcg 


gtaagatcct 


tgagagtttt 


cgccccgaag 


7980 


aacgttttcc 


aatgatgagc 


acttttaaag 


ttctgctatg 


tggcgcggta 


ttatcccgtg 


8040 
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ttgacgccgg 


gcaagagcaa 


ctcggtcgcc 


gcatacacta 


ttctcagaat 


gacttggttg 


8100 


agtactcacc 


agtcacagaa 


aagcatctta 


cggatggcat 


gacagtaaga 


gaattatgca 


8160 


gtgctgccat 


aaccatgagt 


gataacactg 


cggccaactt 


acttctgaca 


acgatcggag 


8220 


gaccgaagga 


gctaaccgct 


tttttgcaca 


acatggggga 


tcatgtaact 


cgccttgatc 


8280 


gttgggaacc 


ggagctgaat 


gaagccatac 


caaacgacga 


gcgtgacacc 


acgatgcctg 


8340 


cagcaatggc 


aacaacgttg 


cgcaaactat 


taactggcga 


actacttact 


ctagcttccc 


8400 


ggcaacaatt 


aatagactgg 


atggaggcgg 


ataaagttgc 


aggaccactt 


ctgcgctcgg 


8460 


cccttccggc 


tggctggttt 


attgctgata 


aatctggagc 


cggtgagcgt 


gggtctcgcg 


8520 


gtatcattgc 


agcactgggg 


ccagatggta 


agccctcccg 


tatcgtagtt 


atctacacga 


8580 


cggggagtca 


ggcaactatg 


gatgaacgaa 


atagacagat 


cgctgagata 


ggtgcctcac 


8640 


tgattaagca 


ttggtaactg 


tcagaccaag 


tttactcata 


tatactttag 


attgatttaa 


8700 


aacttcattt 


ttaatttaaa 


aggatctagg 


tgaagatcct 


ttttgataat 


ctcatgacca 


8760 


aaatccctta 


acgtgagttt 


tcgttccact 


gagcgtcaga 


ccccgtagaa 


aagatcaaag 


8820 


gatcttcttg 


agatcctttt 


tttctgcgcg 


taatctgctg 


cttgcaaaca 


aaaaaaccac 


8880 


cgctaccagc 


ggtggtttgt 


ttgccggatc 


aagagctacc 


aactcttttt 


ccgaaggtaa 


8940 


ctggcttcag 


cagagcgcag 


ataccaaata 


ctgtccttct 


agtgtagccg 


tagttaggcc 


9000 


accacttcaa 


gaactctgta 


gcaccgccta 


catacctcgc 


tctgctaatc 


ctgttaccag 


9060 


tggctgctgc 


cagtggcgat 


aagtcgtgtc 


ttaccgggtt 


ggactcaaga 


cgatagttac 


9120 


cggataaggc 


gcagcggtcg 


ggctgaacgg 


ggggttcgtg 


cacacagccc 


agcttggagc 


9180 


gaacgaccta 


caccgaactg 


agatacctac 


agcgtgagct 


atgagaaagc 


gccacgcttc 


9240 


ccgaagggag 


aaaggcggac 


aggtatccgg 


taagcggcag 


ggtcggaaca 


ggagagcgca 


9300 


cgagggagct 


tccaggggga 


aacgcctggt 


atctttatag 


tcctgtcggg 


tttcgccacc 


9360 


tctgacttga 


gcgtcgattt 


ttgtgatgct 


cgtcaggggg 


gcggagccta 


tggaaaaacg 


9420 


ccagcaacgc 


ggccttttta 


cggttcctgg 


ccttttgctg 


gccttttgct 


cacatgttct 


9480 


ttcctgcgtt 


atcccctgat 


tctgtggata 


accgtattac 


cgcctttgag 


tgagctgata 


9540 


ccgctcgccg 


cagccgaacg 


accgagcgca 


gcgagtcagt 


gagcgaggaa 


gcggaagagc 


9600 


gcctgatgcg 


gtattttctc 


cttacgcatc 


tgtgcggtat 


ttcacaccgc 


atatggtgca 


9660 


ctctcagtac 


aatctgctct 


gatgccgcat 


agttaagcca 


gtatacactc 


cgctatcgct 


9720 


acgtgactgg 


gtcatggctg 


cgccccgaca 


cccgccaaca 


cccgctgacg 


cgccctgacg 


9780 


ggcttgtctg 


ctcccggcat 


ccgcttacag 


acaagctgtg 


accgtctccg 


ggagctgcat 


9840 
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gtgtcagagg 


ttttcaccgt 


catcaccgaa 


acgcgcgagg 


cagctgcggt 


aaagctcatc 


9900 


agcgtggtcg 


tgaagcgatt 


cacagatgtc 


tgcctgttca 


tccgcgtcca 


gctcgttgag 


9960 


tttctccaga 


agcgttaatg 


tctggcttct 


gataaagcgg 


gccatgttaa 


gggcggtttt 


10020 


ttcctgtttg 


gtcactgatg 


cctccgtgta 


agggggattt 


ctgttcatgg 


gggtaatgat 


10080 


accgatgaaa 


cgagagagga 


tgctcacgat 


acgggttact 


gatgatgaac 


atgcccggtt 


10140 


actggaacgt 


tgtgagggta 


aacaactggc 


ggtatggatg 


cggcgggacc 


agagaaaaat 


10200 


cactcagggt 


caatgccagc 


gcttcgttaa 


tacagatgta 


ggtgttccac 


agggtagcca 


10260 


gcagcatcct 


gcgatgcaga 


tccggaacat 


aatggtgcag 


ggcgctgact 


tccgcgtttc 


10320 


cagactttac 


gaaacacgga 


aaccgaagac 


cattcatgtt 


gttgctcagg 


tcgcagacgt 


10380 


tttgcagcag 


cagtcgcttc 


acgttcgctc 


gcgtatcggt 


gattcattct 


gctaaccagt 


10440 


aaggcaaccc 


cgccagccta 


gccgggtcct 


caacgacagg 


agcacgatca 


tgcgcacccg 


10500 


tggccaggac 


ccaacgctgc 


ccgagatgcg 


ccgcgtgcgg 


ctgctggaga 


tggcggacgc 


10560 


gatggatatg 


ttctgccaag 


ggttggtttg 


cgcattcaca 


gttctccgca 


agaattgatt 


10620 


ggctccaatt 


cttggagtgg 


tgaatccgtt 


agcgaggtgc 


cgccggcttc 


cattcaggtc 


10680 


gaggtggccc 


ggctccatgc 


accgcgacgc 


aacgcgggga 


ggcagacaag 


gtatagggcg 


10740 


gcgcctacaa 


tccatgccaa 


cccgttccat 


gtgctcgccg 


aggcggcata 


aatcgccgtg 


10800 


acgatcagcg 


gtccaatgat 


cgaagttagg 


ctggtaagag 


ccgcgagcga 


tccttgaagc 


10860 


tgtccctgat 


ggtcgtcatc 


tacctgcctg 


gacagcatgg 


cctgcaacgc 


gggcatcccg 


10920 


atgccgccgg 


aagcgagaag 


aatcataatg 


gggaaggcca 


tccagcctcg 


cgtcgcgaac 


10980 


gccagcaaga 


cgtagcccag 


cgcgtcggcc 


gccatgccgg 


cgataatggc 


ctgcttctcg 


11040 


ccgaaacgtt 


tggtggcggg 


accagtgacg 


aaggcttgag 


cgagggcgtg 


caagattccg 


11100 


aataccgcaa 


gcgacaggcc 


gatcatcgtc 


gcgctccagc 


gaaagcggtc 


ctcgccgaaa 


11160 


atgacccaga 


gcgctgccgg 


cacctgtcct 


acgagttgca 


tgataaagaa 


gacagtcata 


11220 


agtgcggcga 


cgatagtcat 


gccccgcgcc 


caccggaagg 


agctgactgg 


gttgaaggct 


11280 


ctcaagggca 


tcggtcgagg 


atccttcaat 


atgcgcacat 


acgctgttat 


gttcaaggtc 


11340 


ccttcgttta 


agaacgaaag 


cggtcttcct 


tttgagggat 


gtttcaagtt 


gttcaaatct 


11400 


atcaaatttg 


caaatcccca 


gtctgtatct 


agagcgttga 


atcggtgatg 


cgatttgtta 


11460 


attaaattga 


tggtgtcacc 


attaccaggt 


ctagatatac 


caatggcaaa 


ctgagcacaa 


11520 


caataccagt 


ccggatcaac 


tggcaccatc 


tctcccgtag 


tctcatctaa 


tttttcttcc 


11580 


ggatgaggtt 


ccagatatac 


cgcaacacct 


ttattatggt 


ttccctgagg 


gaataataga 


11640 
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atgtcccatt 


cgaaatcacc 


aattctaaac 


ctgggcgaat 


tgtatttcgg 


gtttgttaac 


11700 


tcgttccagt 


caggaatgtt 


ccacgtgaag 


ctatcttcca 


gcaaagtctc 


cacttcttca 


11760 


tcaaattgtg 


gagaatactc 


ccaatgctct 


tatctatggg 


acttccggga 


aacacagtac 


11820 


cgatacttcc 


caattcgtct 


tcagagctca 


ttgtttgttt 


gaagagacta 


atcaaagaat 


11880 


cgttttctca 


aaaaaattaa 


tatcttaact 


gatagtttga 


tcaaaggggc 


aaaacgtagg 


11940 


ggcaaacaaa 


cggaaaaatc 


gtttctcaaa 


ttttctgatg 


ccaagaactc 


taaccagtct 


12000 


tatctaaaaa 


ttgccttatg 


atccgtctct 


ccggttacag 


cctgtgtaac 


tgattaatcc 


12060 


tgcctttcta 


atcaccattc 


taatgtttta 


attaagggat 


tttgtcttca 


ttaacggctt 


12120 


tcgctcataa 


aaatgttatg 


acgttttgcc 


cgcaggcggg 


aaaccatcca 


cttcacgaga 


12180 


ctgatctcct 


ctgccggaac 


accgggcatc 


tccaacttat 


aagttggaga 


aataagagaa 


12240 


tttcagattg 


agagaatgaa 


aaaaaaaaac 


ccttagttca 


taggtccatt 


ctcttagcgc 


12300 


aactacagag 


aacaggggca 


caaacaggca 


aaaaacgggc 


acaacctcaa 


tggagtgatg 


12360 


caacctgcct 


ggagtaaatg 


atgacacaag 


gcaattgacc 


cacgcatgta 


tctatctcat 


12420 


tttcttacac 


cttctattac 


cttctgctct 


ctctgatttg 


gaaaaagctg 


aaaaaaaagg 


12480 


ttgaaaccag 


ttccctgaaa 


ttattcccct 


acttgactaa 


taagtatata 


aagacggtag 


12540 


gtattgattg 


taattctgta 


aatctatttc 


ttaaacttct 


taaattctac 


ttttatagtt 


12600 


agtctttttt 


ttagttttaa 


aacaccaaga 


acttagtttc 


gaataaacac 


acataaacaa 


12660 



acaagcttac aaaacaaa atg get gca tat gca get cag ggc tat aag gtg 12711 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val 
15 10 

eta gta etc aac ccc tct gtt get gca aca ctg ggc ttt ggt get tac 12759 
Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr 
15 20 25 

atg tec aag get cat ggg ate gat cct aac ate agg ace ggg gtg aga 12807 
Met Ser Lys Ala His Gly He Asp Pro Asn He Arg Thr Gly Val Arg 
30 35 40 

aca att ace act ggc age ccc ate acg tac tec ace tac ggc aag ttc 12855 
Thr He Thr Thr Gly Ser Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe 
45 50 55 

ctt gee gac ggc ggg tgc teg ggg ggc get tat gac ata ata att tgt 12903 
Leu Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp He He He Cys 
60 65 70 75 

gac gag tgc cac tec acg gat gee aca tec ate ttg ggc att ggc act 12951 
Asp Glu Cys His Ser Thr Asp Ala Thr Ser lie Leu Gly He Gly Thr 
80 85 90 

gtc ctt gac caa gca gag act gcg ggg gcg aga ctg gtt gtg etc gee 12999 
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Val Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala 
95 100 105 

acc gcc acc cct ccg ggc tec gtc act gtg ccc cat ccc aac ate gag 13047 
Thr Ala Thr Pro Pro Gly Ser Val Thr Val Pro His Pro Asn He Glu 
110 115 120 

gag gtt get ctg tec acc acc gga gag ate cct ttt tac ggc aag get 13 095 
Glu Val Ala Leu Ser Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala 
125 130 135 

ate ccc etc gaa gta ate aag ggg ggg aga cat etc ate ttc tgt cat 13143 
He Pro Leu Glu Val He Lys Gly Gly Arg His Leu He Phe Cys His 
140 145 150 155 

tea aag aag aag tgc gac gaa etc gcc gca aag ctg gtc gca ttg ggc 13191 
Ser Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly 
160 165 170 

ate aat gcc gtg gcc tac tac cgc ggt ctt gac gtg tec gtc ate ccg 13239 
He Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro 
175 180 185 

acc age ggc gat gtt gtc gtc gtg gca acc gat gcc etc atg acc ggc 13287 
Thr Ser Gly Asp Val Val Val Val Ala Thr Asp Ala Leu Met Thr Gly 
190 195 200 

tat acc ggc gac ttc gac teg gtg ata gac tgc aat acg tgt gtc acc 13335 
Tyr Thr Gly Asp Phe Asp Ser Val He Asp Cys Asn Thr Cys Val Thr 
205 210 215 

cag aca gtc gat ttc age ctt gac cct acc ttc acc att gag aca ate 13383 
Gin Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr He Glu Thr He 
220 225 230 235 

acg etc ccc caa gat get gtc tec cgc act caa cgt egg ggc agg act 13431 
Thr Leu Pro Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr 
240 245 250 

ggc agg ggg aag cca ggc ate tac aga ttt gtg gca ccg ggg gag cgc 13479 
Gly Arg Gly Lys Pro Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg 
255 260 265 

ccc tec ggc atg ttc gac teg tec gtc etc tgt gag tgc tat gac gca 13527 
Pro Ser Gly Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala 
270 275 280 

ggc tgt get tgg tat gag etc acg ccc gcc gag act aca gtt agg eta 13575 
Gly Cys Ala Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu 
285 290 295 

cga gcg tac atg aac acc ccg ggg ctt ccc gtg tgc cag gac cat ctt 13623 
Arg Ala Tyr Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu 
300 305 310 315 

gaa ttt tgg gag ggc gtc ttt aca ggc etc act cat ata gat gcc cac 13671 
Glu Phe Trp Glu Gly Val Phe Thr Gly Leu Thr His He Asp Ala His 
320 325 330 
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ttt eta tec cag aca aag cag agt ggg gag aac ctt cct tac ctg gta 13719 
Phe Leu Ser Gin Thr Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val 
335 340 345 

gcg tac caa gee acc gtg tgc get agg get caa gee cct ccc cca teg 13767 
Ala Tyr Gin Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser 
350 355 360 

tgg gac cag atg tgg aag tgt ttg att cgc etc aag ccc acc etc cat 13815 
Trp Asp Gin Met Trp Lys Cys Leu lie Arg Leu Lys Pro Thr Leu His 
365 370 375 

ggg cca aca ccc ctg eta tac aga ctg ggc get gtt cag aat gaa ate 13863 
Gly Pro Thr Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu lie 
380 385 390 395 

acc ctg acg cac cca gtc acc aaa tac ate atg aca tgc atg teg gee 13911 
Thr Leu Thr His Pro Val Thr Lys Tyr He Met Thr Cys Met Ser Ala 
400 405 410 

gac ctg gag gtc gtc acg age acc tgg gtg etc gtt ggc ggc gtc ctg 13959 
Asp Leu Glu Val Val Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu 
415 420 425 

get get ttg gee gcg tat tgc ctg tea aca ggc tgc gtg gtc ata gtg 14007 
Ala Ala Leu Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val He Val 
430 435 440 

ggc agg gtc gtc ttg tec ggg aag ccg gca ate ata cct gac agg gaa 14055 
Gly Arg Val Val Leu Ser Gly Lys Pro Ala He He Pro Asp Arg Glu 
445 450 455 

gtc etc tac cga gag ttc gat gag atg gaa gag tgc tct cag cac tta 14103 
Val Leu Tyr Arg Glu Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu 
460 465 470 475 

ccg tac ate gag caa ggg atg atg etc gee gag cag ttc aag cag aag 14151 
Pro Tyr He Glu Gin Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys 
480 485 490 

gec etc ggc etc ctg cag acc gcg tec cgt cag gca gag gtt ate gee 14199 
Ala Leu Gly Leu Leu Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala 
495 500 505 

cct get gtc cag acc aac tgg caa aaa etc gag acc ttc tgg gcg aag 14247 
Pro Ala Val Gin Thr Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys 
510 515 520 

cat atg tgg aac ttc ate agt ggg ata caa tac ttg gcg ggc ttg tea 14295 
His Met Trp Asn Phe He Ser Gly He Gin Tyr Leu Ala Gly Leu Ser 
525 530 535 

acg ctg cct ggt aac ccc gee att get tea ttg atg get ttt aca get 14343 
Thr Leu Pro Gly Asn Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala 
540 545 550 555 

get gtc acc age cca eta acc act age caa acc etc etc ttc aac ata 14391 
Ala Val Thr Ser Pro Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He 
560 565 570 
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ttg ggg ggg tgg gtg get gec cag etc gee gee ccc ggt gee get act 14439 
Leu Gly Gly Trp Val Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr 
575 580 585 

gee ttt gtg ggc get ggc tta get ggc gee gee ate ggc agt gtt gga 144 87 
Ala Phe Val Gly Ala Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly 
590 595 600 

ctg ggg aag gtc etc ata gac ate ctt gca ggg tat ggc gcg ggc gtg 14535 
Leu Gly Lys Val Leu He Asp He Leu Ala Gly Tyr Gly Ala Gly Val 
605 610 615 

gcg gga get ctt gtg gca ttc aag ate atg age ggt gag gtc ccc tec 14 583 
Ala Gly Ala Leu Val Ala Phe Lys He Met Ser Gly Glu Val Pro Ser 
620 625 630 635 

acg gag gac ctg gtc aat eta ctg ccc gee ate etc teg ccc gga gee 14631 
Thr Glu Asp Leu Val Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala 
640 645 650 

etc gta gtc ggc gtg gtc tgt gca gca ata ctg cgc egg cac gtt ggc 1467 9 
Leu Val Val Gly Val Val Cys Ala Ala He Leu Arg Arg His Val Gly 
655 660 665 

ccg ggc gag ggg gca gtg cag tgg atg aac egg ctg ata gee ttc gee 14727 
Pro Gly Glu Gly Ala Val Gin Trp Met Asn Arg Leu He Ala Phe Ala 
670 675 680 

tec egg ggg aac cat gtt tec ccc acg cac tac gtg ccg gag age gat 14775 
Ser Arg Gly Asn His Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp 
685 690 695 

gca get gee cgc gtc act gee ata etc age age etc act gta ace cag 14823 
Ala Ala Ala Arg Val Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin 
700 705 710 715 

etc ctg agg cga ctg cac cag tgg ata age teg gag tgt acc act cca 14871 
Leu Leu Arg Arg Leu His Gin Trp He Ser Ser Glu Cys Thr Thr Pro 
720 725 730 

tgc tec ggt tec tgg eta agg gac ate tgg gac tgg ata tgc gag gtg 14919 
Cys Ser Gly Ser Trp Leu Arg Asp He Trp Asp Trp He Cys Glu Val 
735 740 745 

ttg age gac ttt aag acc tgg eta aaa get aag etc atg cca cag ctg 14967 
Leu Ser Asp Phe Lys Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu 
750 755 760 

cct ggg ate ccc ttt gtg tec tgc cag cgc ggg tat aag ggg gtc tgg 15015 
Pro Gly He Pro Phe Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp 
765 770 775 

cga ggg gac ggc ate atg cac act cgc tgc cac tgt gga get gag ate 15063 
Arg Gly Asp Gly He Met His Thr Arg Cys His Cys Gly Ala Glu He 
780 785 790 795 

act gga cat gtc aaa aac ggg acg atg agg ate gtc ggt cct agg acc 15111 
Thr Gly His Val Lys Asn Gly Thr Met Arg He Val Gly Pro Arg Thr 
800 805 810 
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tgc agg aac atg tgg agt ggg acc ttc ccc att aat gcc tac acc acg 
Cys Arg Asn Met Trp Ser Gly Thr Phe Pro lie Asn Ala Tyr Thr Thr 
815 820 825 



15159 



ggc ccc tgt acc ccc ctt cct gcg ccg aac tac acg ttc gcg eta tgg 
Gly Pro Cys Thr Pro Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp 
830 835 840 



15207 



*99 gtg tct gca gag gaa tac gtg gag ata agg cag gtg ggg gac ttc 
Arg Val Ser Ala Glu Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe 
845 850 855 



15255 



cac tac gtg acg ggt atg act act gac aat ctt aaa tgc ccg tgc cag 
His Tyr Val Thr Gly Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin 
860 865 870 875 



15303 



gtc cca teg ccc gaa ttt ttc aca gaa ttg gac ggg gtg cgc eta cat 
Val Pro Ser Pro Glu Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His 
880 885 890 



15351 



agg ttt gcg ccc ccc tgc aag ccc ttg ctg egg gag gag gta tea ttc 
Arg Phe Ala Pro Pro Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe 
895 900 905 



15399 



aga gta gga etc cac gaa tac ccg gta ggg teg caa tta cct tgc gag 
Arg Val Gly Leu His Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu 
910 915 920 



15447 



ccc gaa ccg gac gtg gcc gtg ttg acg tec atg etc act gat ccc tec 
Pro Glu Pro Asp Val Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser 
925 930 935 



15495 



cat ata aca gca gag gcg gcc ggg cga agg ttg gcg agg gga tea ccc 
His He Thr Ala Glu Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro 
940 945 950 955 



15543 



ccc tct gtg gcc age tec teg get age cag eta tec get cca tct etc 
Pro Ser Val Ala Ser Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu 
960 965 970 



15591 



aag gca act tgc acc get aac cat gac tec cct gat get gag etc ata 
Lys Ala -Thr Cys Thr Ala Asn His Asp Ser Pro Asp Ala Glu Leu He 
975 980 985 



15639 



gag gcc aac etc eta tgg agg cag gag atg ggc ggc aac ate acc agg 
Glu Ala Asn Leu Leu Trp Arg Gin Glu Met Gly Gly Asn He Thr Arg 
990 995 1000 



15687 



gtt gag tea gaa aac aaa gtg gtg att ctg gac tec ttc gat ccg ctt 
Val Glu Ser Glu Asn Lys Val Val He Leu Asp Ser Phe Asp Pro Leu 
1005 1010 1015 



15735 



gtg gcg gag gag gac gag egg gag ate tec gta ccc gca gaa ate ctg 
Val Ala Glu Glu Asp Glu Arg Glu. He Ser Val Pro Ala Glu He Leu 
1020 1025 1030 1035 



15783 



egg aag tct egg aga ttc gcc cag gcc ctg ccc gtt tgg gcg egg ccg 
Arg Lys Ser Arg Arg Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro 
1040 1045 1050 



15831 



97 



WO 01/38360 



PCT/US00/32326 



gcc aga aag gcc gta acc cac ate aac tec gtg tgg aaa gac ctt ctg 16599 

Ala Arg Lys Ala Val Thr His lie Asn Ser Val Trp Lys Asp Leu Leu 
1295 1300 ~ 1305 

gaa gac aat gta aca cca ata gac act acc ate atg get aag aac gag 16647 

Glu Asp Asn Val Thr Pro lie Asp Thr Thr lie Met Ala Lys Asn Glu 
1310 1315 1320 

gtt ttc tgc gtt cag cct gag aag ggg ggt cgt aag cca get cgt etc 16695 

Val Phe Cys Val Gin Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu 
1325 1330 1335 

ate gtg ttc ccc gat ctg ggc gtg cgc gtg tgc gaa aag atg get ttg 16743 

lie Val Phe Pro Asp Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu 

1340 1345 1350 1355 

tac gac gtg gtt aca aag etc ccc ttg gcc gtg atg gga age tec tac 16791 

Tyr Asp Val Val Thr Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr 
1360 1365 1370 

gga ttc caa tac tea cca gga cag egg gtt gaa ttc etc gtg caa gcg 1683 9 

Gly Phe Gin Tyr Ser Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala 
1375 1380 1385 

tgg aag tec aag aaa acc cca atg ggg ttc teg tat gat acc cgc tgc 16887 

Trp Lys Ser Lys Lys Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys 
1390 1395 1400 

ttt gac tec aca gtc act gag age gac ate cgt acg gag gag gca ate 16935 

Phe Asp Ser Thr Val Thr Glu Ser Asp lie Arg Thr Glu Glu Ala lie 
1405 1410 1415 

tac caa tgt tgt gac etc gac ccc caa gcc cgc gtg gcc ate aag tec 16983 

Tyr Gin Cys Cys Asp Leu Asp Pro Gin Ala Arg Val Ala lie Lys Ser 

1420 1425 1430 * 1435 

etc acc gag agg ctt tat gtt ggg ggc cct ctt acc aat tea agg ggg 17031 

Leu Thr Glu Arg Leu Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly 
1440 1445 1450 

gag aac tgc ggc tat cgc agg tgc cgc gcg age ggc gta ctg aca act 17079 

Glur.Asn Cys Gly Tyr Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr 
1455 1460 1465 

age tgt ggt aac acc etc act tgc tac ate aag gcc egg gca gcc tgt 17127 

Ser Cys Gly Asn Thr Leu Thr Cys Tyr lie Lys Ala Arg Ala Ala Cys 
1470 1475 1480 

cga gcc gca ggg etc cag gac tgc acc atg etc gtg tgt ggc gac gac 17175 

Arg Ala Ala Gly Leu Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp 
1485 1490 1495 

tta gtc gtt ate tgt gaa age gcg ggg gtc cag gag gac gcg gcg age 17223 

Leu Val Val He Cys Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser 

1500 1505 1510 1515 

ctg aga gcc ttc acg gag get atg acc agg tac tec gcc ccc cct ggg 17271 
Leu Arg Ala Phe Thr Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly 
1520 1525 1530 
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gac ccc cca caa cca gaa tac gac ttg gag etc ata aca tea tgc tec 17319 
Asp Pro Pro Gin Pro Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser 
1535 1540 1545 

tec aac gtg tea gtc gee cac gac ggc get gga aag agg gtc tac tac 17367 
Ser Asn Val Ser Val Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr 
1550 1555 1560 

etc ace cgt gac cct aca ace ccc etc gcg aga get gcg tgg gag aca 17415 
Leu Thr Arg Asp Pro Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr 
1565 1570 1575 

gca aga cac act cca gtc aat tec tgg eta ggc aac ata ate atg ttt 17463 
Ala Arg His Thr Pro Val Asn Ser Trp Leu Gly Asn lie lie Met Phe 
1580 1585 1590 1595 

gee ccc aca ctg tgg gcg agg atg ata ctg atg acc cat ttc ttt age 17511 
Ala Pro Thr Leu Trp Ala Arg Met lie Leu Met Thr His Phe Phe Ser 
1600 1605 1610 

gtc ctt ata gee agg gac cag ctt gaa cag gee etc gat tgc gag ate 17559 
Val Leu lie Ala Arg Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu lie 
1615 1620 1625 

tac ggg gee tgc tac tec ata gaa cca ctg gat eta cct cca ate att 17607 
Tyr Gly Ala Cys Tyr Ser lie Glu Pro Leu Asp Leu Pro Pro lie lie 
1630 1635 1640 

caa aga etc cat ggc etc age gca ttt tea etc cac agt tac tct cca 17655 
Gin Arg Leu His Gly Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro 
1645 1650 1655 

ggt gaa ate aat agg gtg gee gca tgc etc aga aaa ctt ggg gta ccg 177 03 
Gly Glu lie Asn Arg Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro 
1660 1665 1670 1675 

ccc ttg cga get tgg aga cac egg gec egg age gtc cgc get agg ctt 17751 
Pro Leu Arg Ala Trp Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu 
1680 1685 1690 

ctg gee aga gga ggc agg get gec ata tgt ggc aag tac etc ttc aac 17799 
Leu Ala Arg Gly Gly Arg Ala Ala lie Cys Gly Lys Tyr Leu Phe Asn 
1695 1700 1705 

tgg gca gta aga aca aag etc aaa etc act cca ata gcg gec get ggc 17847 
Trp Ala Val Arg Thr Lys Leu Lys Leu Thr Pro lie Ala Ala Ala Gly 
1710 1715 1720 

cag ctg gac ttg tec ggc tgg ttc acg get ggc tac age ggg gga gac 17895 
Gin Leu Asp Leu Ser Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp 
1725 1730 1735 

att tat cac age gtg tct cat gee egg ccc cgc tgg ate tgg ttt tgc 17943 
lie Tyr His Ser Val Ser His Ala Arg Pro Arg Trp lie Trp Phe Cys 
1740 1745 1750 1755 

eta etc ctg ctt get gca ggg gta ggc ate tac etc etc ccc aac cga 17991 
Leu Leu Leu Leu Ala Ala Gly Val Gly lie Tyr Leu Leu Pro Asn Arg 
1760 1765 1770 
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atg age acg aat cct aaa cct caa aga aag acc aaa cgt aac acc aac 18039 
Met Ser Thr Asn Pro Lys Pro Gin Arg Lys Thr Lys Arg Asn Thr Asn 
1775 1780 1785 

egg egg ccg cag gac gtc aag ttc ccg ggt ggc ggt cag ate gtt ggt 18087 
Arg Arg Pro Gin Asp Val Lys Phe Pro Gly Gly Gly Gin lie Val Gly 
1790 1795 1800 

gga gtt tac ttg ttg ccg cgc agg ggc cct aga ttg ggt gtg cgc gcg 18135 
Gly Val Tyr Leu Leu Pro Arg Arg Gly Pro Arg Leu Gly Val Arg Ala 
1805 1810 1815 

acg aga aag act tec gag egg teg caa cct cga ggt aga cgt cag cct 18183 
Thr Arg Lys Thr Ser Glu Arg Ser Gin Pro Arg Gly Arg Arg Gin Pro 
1820 1825 1830 1835 

ate ccc aag get cgt egg ccc gag ggc agg acc tgg get cag ccc ggg 18231 
lie Pro Lys Ala Arg Arg Pro Glu Gly Arg Thr Trp Ala Gin Pro Gly 
1840 1845 1850 

tac cct tgg ccc etc tat ggc aat gag ggc tgc ggg tgg gcg gga tgg 18279 
Tyr Pro Trp Pro Leu Tyr Gly Asn Glu Gly Cys Gly Trp Ala Gly Trp 
1855 1860 1865 

etc ctg tct ccc cgt ggc tct egg cct age tgg ggc ccc aca gac ccc 18327 
Leu Leu Ser Pro Arg Gly Ser Arg Pro Ser Trp Gly Pro Thr Asp Pro 
1870 1875 1880 



egg cgt agg teg cgc aat ttg ggt aag taatagtcga ctttgttccc 18374 
Arg Arg Arg Ser Arg Asn Leu Gly Lys 
1885 1890 



actgtacttt 


tagctegtae 


aaaatacaat 


atacttttca 


tttctccgta 


aacaacatgt 


18434 


tttcccatgt 


aatatccttt 


tctatttttc 


gttccgttac 


caactttaca 


catactttat 


18494 


atagctattc 


acttctatac 


actaaaaaac 


taagacaatt 


ttaattttgc 


tgcctgccat 


18554 


atttcaattt 


gttataaatt 


cctataattt 


atcctattag 


tagctaaaaa 


aagatgaatg 


18614 


tgaategaat 


cctaagagaa 


ttggatctga 


tccacaggac 


gggtgtggtc 


gecatgateg 


18674 


cgtagtcgat 


agtggctcca 


agtagcgaag 


cgagcaggac 


tgggeggegg 


ecaaageggt 


18734 


cggacagtgc 


tccgagaacg 


ggtgcgcata 


gaaattgeat 


caaegcatat 


agegctagea 


18794 


gcacgccata 


gtgactggcg 


atgctgtcgg 


aatggacgat 


atcccgcaag 


aggcccggca 


18854 


gtaceggcat 


aaccaagcct 


atgcctacag 


catccagggt 


gacggtgccg 


aggatgacga 


18914 


tgagegcatt 


gttagatttc 


atacaeggtg 


cctgactgcg 


ttagcaattt 


aactgtgata 


18974 


aactaccgca 


ttaaagcttt 


ttctttccaa 


tttttttttt 


ttegtcatta 


taaaaatcat 


19034 


tacgaccgag 


attccegggt 


aataactgat 


ataattaaat 


tgaagctcta 


atttgtgagt 


19094 


ttagtataca 


tgcatttact 


tataatacag 


ttttttagtt 


ttgctggccg 


catcttctca 


19154 


aatatgette 


ccagcctgct 


tttctgtaac 


gttcaccctc 


taccttagca 


tcccttccct 


19214 



101 



WO 01/38360 



PCT/US00/32326 



ttgcaaatag 


tcctcttcca 


acaataataa 


tgtcagatcc 


tgtagagacc 


acatcatcca 


19274 


cggttctata 


ctgttgaccc 


aatgcgtctc 


ccttgtcatc 


taaacccaca 


ccgggtgtca 


19334 


taatcaacca 


atcgtaacct 


tcatctcttc 


cacccatgtc 


tctttgagca 


ataaagccga 


19394 


taacaaaatc 


tttgtcgctc 


ttcgcaatgt 


caacagtacc 


cttagtatat 


tctccagtag 


19454 


atagggagcc 


cttgcatgac 


aattctgcta 


acatcaaaag 


gcctctaggt 


tcctttgtta 


19514 


cttcttctgc 


cgcctgcttc 


aaaccgctaa 


caatacctgg 


gcccaccaca 


ccgtgtgcat 


19574 


tcgtaatgtc 


tgcccattct 


gctattctgt 


atacacccgc 


agagtactgc 


aatttgactg 


19634 


tattaccaat 


gtcagcaaat 


tttctgtctt 


cgaagagtaa 


aaaattgtac 


ttggcggata 


19694 


atgcctttag 


cggcttaact 


gtgccctcca 


tggaaaaatc 


agtcaagata 


tccacatgtg 


19754 


tttttagtaa 


acaaattttg 


ggacctaatg 


cttcaactaa 


ctccagtaat 


tccttggtgg 


19814 


tacgaacatc 


caatgaagca 


cacaagtttg 


tttgcttttc 


gtgcatgata 


ttaaatagct 


19874 


tggcagcaac 


aggactagga 


tgagtagcag 


cacgttcctt 


atatgtagct 


ttcgacatga 


19934 


tttatcttcg 


tttcctgcag 


gtttttgttc 


tgtgcagttg 


ggttaagaat 


actgggcaat 


19994 


ttcatgtttc 


ttcaacacta 


catatgcgta 


tatataccaa 


tctaagtctg 


tgctccttcc 


20054 


ttcgttcttc 


cttctgttcg 


gagattaccg 


aatcaaaaaa 


atttcaagga 


aaccgaaatc 


20114 


aaaaaaaaga 


ataaaaaaaa 


aatgatgaat 


tgaaaagctt 


atcgat 




20160 



<210> 13 
<211> 1892 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd. delta. NS3NS5. pj .corel21 

<400> 13 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu Asn Pro 
15 10 15 

Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys Ala His 
20 25 30 

Gly lie Asp Pro Asn lie Arg Thr Gly Val Arg Thr lie Thr Thr Gly 
35 40 45 

Ser Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp Gly Gly 
50 55 60 

Cys Ser Gly Gly Ala Tyr Asp He He He Cys Asp Glu Cys His Ser 
65 70 75 80 

Thr Asp Ala Thr Ser He Leu Gly He Gly Thr Val Leu Asp Gin Ala 
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85 



90 



95 



Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr Pro Pro 
100 105 110 

Gly Ser Val Thr Val Pro His Pro Asn He Glu Glu Val Ala Leu Ser 
115 120 125 

Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala He Pro Leu Glu Val 
130 135 140 

He Lys Gly Gly Arg His Leu He Phe Cys His Ser Lys Lys Lys Cys 
145 150 155 160 

Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly He Asn Ala Val Ala 
165 170 175 

Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro Thr Ser Gly Asp Val 
180 185 190 

Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe 
195 200 205 

Asp Ser Val He Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe 
210 215 220 

Ser Leu Asp Pro Thr Phe Thr He Glu Thr He Thr Leu Pro Gin Asp 
225 230 235 240 

Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly Lys Pro 
245 250 255 

Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly Met Phe 
260 265 270 

Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala Trp Tyr 
275 280 285 

Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr Met Asn 
290 295 300 

Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp Glu Gly 
305 310 315 320 

Val Phe Thr Gly Leu Thr His He Asp Ala His Phe Leu Ser Gin Thr 
325 330 335 

Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin Ala Thr 
340 345 * 350 

Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin Met Trp 
355 360 365 

Lys Cys Leu He Arg Leu Lys Pro Thr Leu His Gly Pro Thr Pro Leu 
370 375 380 

Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu He Thr Leu Thr His Pro 
385 390 395 400 
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Val Thr Lys Tyr lie Met Thr Cys Met Ser Ala Asp Leu Glu Val Val 
405 410 415 

Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu Ala Ala 
420 425 430 

Tyr Cys Leu Ser Thr Gly Cys Val Val He Val Gly Arg Val Val Leu 
435 440 445 

Ser Gly Lys Pro Ala He He Pro Asp Arg Glu Val Leu Tyr Arg Glu 
450 455 460 

Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr He Glu Gin 
465 470 475 480 

Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly Leu Leu 
485 490 495 

Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala Pro Ala Val Gin Thr 
500 505 510 

Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp Asn Phe 
515 520 525 

He Ser Gly lie Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro Gly Asn 
530 535 540 

Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr Ser Pro 
545 550 555 560 

Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He Leu Gly Gly Trp Val 
565 570 575 

Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val Gly Ala 
580 585 590 

Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly Leu Gly Lys Val Leu 
595 600 605 

He Asp He Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala Leu Val 
610 615 620 

Ala Phe Lys He Met Ser Gly Glu Val Pro Ser Thr Glu Asp Leu Val 
625 630 635 640 

Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala Leu Val Val Gly Val 
645 650 655 

Val Cys Ala Ala He Leu Arg Arg His Val Gly Pro Gly Glu Gly Ala 
660 665 670 

Val Gin Trp Met Asn Arg Leu lie Ala Phe Ala Ser Arg Gly Asn His 
675 680 685 

Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala Arg Val 
690 695 700 



Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg Arg Leu 
705 710 715 720 
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His Gin Trp lie Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly Ser Trp 
725 730 735 

Leu Arg Asp He Trp Asp Trp He Cys Glu Val Leu Ser Asp Phe Lys 
740 745 750 

Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly He Pro Phe 
755 760 765 

Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp Gly He 
770 775 780 

Met His Thr Arg Cys His Cys Gly Ala Glu He Thr Gly His Val Lys 
785 790 795 800 

Asn Gly Thr Met Arg He Val Gly Pro Arg Thr Cys Arg Asn Met Trp 
805 810 815 

Ser Gly Thr Phe Pro lie Asn Ala Tyr Thr Thr Gly Pro Cys Thr Pro 
820 825 830 

Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser Ala Glu 
835 840 845 

Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe His Tyr Val Thr Gly 
850 855 860 

Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser Pro Glu 
865 870 875 880 

Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala Pro Pro 
885 890 895 

Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly Leu His 
900 905 910 

Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro Asp Val 
915 920 925 

Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His He Thr Ala Glu 
930 935 940 

Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val Ala Ser 
945 950 955 960 

Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr Cys Thr 
965 970 * 975 

Ala Asn His Asp Ser Pro Asp Ala Glu Leu He Glu Ala Asn Leu Leu 
980 985 990 

Trp Arg Gin Glu Met Gly Gly Asn He Thr Arg Val Glu Ser Glu Asn 
995 1000 1005 

Lys Val Val He Leu Asp Ser Phe Asp Pro Leu Val Ala Glu Glu Asp 
1010 1015 1020 

Glu Arg Glu lie Ser Val Pro Ala Glu lie Leu Arg Lys Ser Arg Arg 
025 1030 1035 1040 
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Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn Pro Pro 
1045 1050 1055 

Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val Val His 
1060 1065 1070 

Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro Pro Arg 
1075 1080 1085 

Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr Ala Leu 
1090 1095 1100 

Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser Gly lie 
105 1110 1115 1120 

Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser Gly Cys 
1125 1130 1135 

Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro Leu Glu 
1140 1145 1150 

Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser Thr Val 
1155 1160 1165 

Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met Ser Tyr 
1170 1175 1180 

Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu Gin Lys 
185 1190 1195 1200 

Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu Arg His His Asn Leu 
1205 1210 1215 

Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys Lys Val 
1220 1225 1230 

Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp Val Leu 
1235 1240 1245 

Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu Leu Ser 
1250 1255 1260 

Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys Ser Lys 
265 1270 1275 1280 

Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys Ala Val 
1285 1290 1295 

Thr His lie Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn Val Thr 
1300 1305 1310 

Pro He Asp Thr Thr He Met Ala Lys Asn Glu Val Phe Cys Val Gin 
1315 1320 1325 

Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu He Val Phe Pro Asp 
1330 1335 1340 

Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val Val Thr 
345 1350 1355 1360 
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Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin Tyr Ser 
1365 1370 1375 

Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser Lys Lys 
1380 1385 1390 

Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val 
1395 1400 1405 

Thr Glu Ser Asp lie Arg Thr Glu Glu Ala lie Tyr Gin Cys Cys Asp 
1410 1415 1420 

Leu Asp Pro Gin Ala Arg Val Ala lie Lys Ser Leu Thr Glu Arg Leu 
425 1430 1435 1440 

Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr 
1445 1450 1455 

Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
1460 1465 1470 

Leu Thr Cys Tyr lie Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
1475 1480 1485 

Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val lie Cys 
1490 1495 1500 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe Thr 
505 1510 1515 1520 

Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro Gin Pro 
1525 1530 1535 

Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser Ser Asn Val Ser Val 
1540 1545 1550 

Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg Asp Pro 
1555 1560 1565 

Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His Thr Pro 
1570 1575 1580 

Val Asn Ser Trp Leu Gly Asn lie lie Met Phe Ala Pro Thr Leu Trp 
585 1590 1595 1600 

Ala Arg Met lie Leu Met Thr His Phe Phe Ser Val Leu He Ala Arg 
1605 1610 1615 

Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu He Tyr Gly Ala Cys Tyr 
1620 1625 1630 

Ser He Glu Pro Leu Asp Leu Pro Pro He He Gin Arg Leu His Gly 
1635 1640 1645 

Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu He Asn Arg 
1650 1655 1660 

Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg Ala Trp 
665 1670 1675 " 1680 
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Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg Gly Gly 
1685 1690 1695 

Arg Ala Ala lie Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val Arg Thr 
1700 1705 1710 

Lys Leu Lys Leu Thr Pro lie Ala Ala Ala Gly Gin Leu Asp Leu Ser 
1715 1720 1725 

Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp lie Tyr His Ser Val 
1730 1735 1740 

Ser His Ala Arg Pro Arg Trp lie Trp Phe Cys Leu Leu Leu Leu Ala 
745 1750 1755 1760 

Ala Gly Val Gly lie Tyr Leu Leu Pro Asn Arg Met Ser Thr Asn Pro 
1765 1770 1775 

Lys Pro Gin Arg Lys Thr Lys Arg Asn Thr Asn Arg Arg Pro Gin Asp 
1780 1785 1790 

Val Lys Phe Pro Gly Gly Gly Gin lie Val Gly Gly Val Tyr Leu Leu 
1795 1800 1805 

Pro Arg Arg Gly Pro Arg Leu Gly Val Arg Ala Thr Arg Lys Thr Ser 
1810 1815 1820 

Glu Arg Ser Gin Pro Arg Gly Arg Arg Gin Pro lie Pro Lys Ala Arg 
825 1830 1835 1840 

Arg Pro Glu Gly Arg Thr Trp Ala Gin Pro Gly Tyr Pro Trp Pro Leu 
1845 1850 1855 

Tyr Gly Asn Glu Gly Cys Gly Trp Ala Gly Trp Leu Leu Ser Pro Arg 
1860 1865 1870 

Gly Ser Arg Pro Ser Trp Gly Pro Thr Asp Pro Arg Arg Arg Ser Arg 
1875 1880 1885 

Asn Leu Gly Lys 
1890 



<210> 14 
<211> 20316 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd . delta .NS3NS5 .pj . corel73 

<220> 
<221> CDS 

<222> (12679) . . (18510) 
<400> 14 

atcgatccta ccccttgcgc taaagaagta tatgtgccta ctaacgcttg tctttgtctc 60 
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tgtcactaaa 


cactggatta 


ttactcccag 


atacttattt 


tggactaatt 


taaatgattt 


120 


cggatcaacg 


ttcttaatat 


cgctgaatct 


tccacaattg 


atgaaagtag 


ctaggaagag 


180 


gaattggtat 


aaagtttttg 


tttttgtaaa 


tctcgaagta 


tactcaaacg 


aatttagtat 


240 


tttctcagtg 


atctcccaga 


tgctttcacc 


ctcacttaga 


agtgctttaa 


gcattttttt 


300 


actgtggcta 


tttcccttat 


ctgcttcttc 


cgatgattcg 


aactgtaatt 


gcaaactact 


360 


tacaatatca 


gtgatatcag 


attgatgttt 


ttgtccatag 


taaggaataa 


ttgtaaattc 


420 


ccaagcagga 


atcaatttct 


ttaatgaggc 


ttccagaatt 


gttgcttttt 


gcgtcttgta 


480 


tttaaactgg 


agtgatttat 


tgacaatatc 


gaaactcagc 


gaattgctta 


tgatagtatt 


540 


atagctcatg 


aatgtggctc 


tcttgattgc 


tgttccgtta 


tgtgtaatca 


tccaacataa 


600 


ataggttagt 


tcagcagcac 


ataatgctat 


tttctcacct 


gaaggtcttt 


caaacctttc 


660 


cacaaactga 


cgaacaagca 


ccttaggtgg 


tgttttacat 


aatatatcaa 


attgtggcat 


720 


gcttagcgcc 


gatcttgtgt 


gcaattgata 


tctagtttca 


actactctat 


ttatcttgta 


780 


tcttgcagta 


ttcaaacacg 


ctaactcgaa 


aaactaactt 


taattgtcct 


gtttgtctcg 


840 


cgttctttcg 


aaaaatgcac 


cggccgcgca 


ttatttgtac 


tgcgaaaata 


attggtactg 


900 


cggtatcttc 


atttcatatt 


ttaaaaatgc 


acctttgctg 


cttttcctta 


atttttagac 


960 


ggcccgcagg 


ttcgttttgc 


ggtactatct 


tgtgataaaa 


agttgttttg 


acatgtgatc 


1020 


tgcacagatt 


ttataatgta 


ataagcaaga 


atacattatc 


aaacgaacaa 


tactggtaaa 


1080 


agaaaaccaa 


aatggacgac 


attgaaacag 


ccaagaatct 


gacggtaaaa 


gcacgtacag 


1140 


cttatagcgt 


ctgggatgta 


tgtcggctgt 


ttattgaaat 


gattgctcct 


gatgtagata 


1200 


ttgatataga 


gagtaaacgt 


aagtctgatg 


agctactctt 


tccaggatat 


gtcataaggc 


1260 


ccatggaatc 


tctcacaacc 


ggtaggccgt 


atggtcttga 


ttctagcgca 


gaagattcca 


1320 


gcgtatcttc 


tgactccagt 


gctgaggtaa 


ttttgcctgc 


tgcgaagatg 


gttaaggaaa 


1380 


ggtttgattc 


gattggaaat 


ggtatgctct 


cttcacaaga 


agcaagtcag 


gctgccatag 


1440 


atttgatgct 


acagaataac 


aagctgttag 


acaatagaaa 


gcaactatac 


aaatctattg 


1500 


ctataataat 


aggaagattg 


cccgagaaag 


acaagaagag 


agctaccgaa 


atgctcatga 


1560 


gaaaaatgga 


ttgtacacag 


ttattagtcc 


caccagctcc 


aacggaagaa 


gatgttatga 


1620 


agctcgtaag 


cgtcgttacc 


caattgctta 


ctttagttcc 


accagatcgt 


caagctgctt 


1680 


taataggtga 


tttattcatc 


ccggaatctc 


taaaggatat 


attcaatagt 


ttcaatgaac 


1740 


tggcggcaga 


gaatcgttta 


cagcaaaaaa 


agagtgagtt 


ggaaggaagg 


actgaagtga 


1800 


accatgctaa 


tacaaatgaa 


gaagttccct 


ccaggcgaac 


aagaagtaga 


gacacaaatg 


1860 
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caagaggagc 


atataaatta 


caaaacacca 


tcactgaggg 


ccctaaagcg 


gttcccacga 


1920 


aaaaaaggag 


agtagcaacg 


agggtaaggg 


gcagaaaatc 


acgtaatact 


tctagggtat 


1980 


gatccaatat 


caaaggaaat 


gatagcattg 


aaggatgaga 


ctaatccaat 


tgaggagtgg 


2040 


cagcatatag 


aacagctaaa 


gggtagtgct 


gaaggaagca 


tacgataccc 


cgcatggaat 


2100 


gggataatat 


cacaggaggt 


actagactac 


ctttcatcct 


acataaatag 


acgcatataa 


2160 


gtacgcattt 


aagcataaac 


acgcactatg 


ccgttcttct 


catgtatata 


tatatacagg 


2220 


caacacgcag 


atataggtgc 


gacgtgaaca 


gtgagctgta 


tgtgcgcagc 


tcgcgttgca 


2280 


ttttcggaag 


cgctcgtttt 


cggaaacgct 


ttgaagttcc 


tattccgaag 


ttcctattct 


2340 


ctagaaagta 


taggaacttc 


agagcgcttt 


tgaaaaccaa 


aagcgctctg 


aagacgcact 


2400 


ttcaaaaaac 


caaaaacgca 


ccggactgta 


acgagctact 


aaaatattgc 


gaataccgct 


2460 


tccacaaaca 


ttgctcaaaa 


gtatctcttt 


gctatatatc 


tctgtgctat 


atccctatat 


2520 


aacctaccca 


tccacctttc 


gctccttgaa 


cttgcatcta 


aactcgacct 


ctacatcaac 


2580 


aggcttccaa 


tgctcttcaa 


attttactgt 


caagtagacc 


catacggctg 


taatatgctg 


2640 


ctcttcataa 


tgtaagctta 


tctttatcga 


atcgtgtgaa 


aaactactac 


cgcgataaac 


2700 


ctttacggtt 


ccctgagatt 


gaattagttc 


ctttagtata 


tgatacaaga 


cacttttgaa 


2760 


ctttgtacga 


cgaattttga 


ggttcgccat 


cctctggcta 


tttccaatta 


tcctgtcggc 


2820 


tattatctcc 


gcctcagttt 


gatcttccgc 


ttcagactgc 


catttttcac 


ataatgaatc 


2880 


tatttcaccc 


cacaatcctt 


catccgcctc 


cgcatcttgt 


tccgttaaac 


tattgacttc 


2940 


atgttgtaca 


ttgtttagtt 


cacgagaagg 


gtcctcttca 


ggcggtagct 


cctgatctcc 


3000 


tatatgacct 


ttatcctgtt 


ctctttccac 


aaacttagaa 


atgtattcat 


gaattatgga 


3060 


gcacctaata 


acattcttca 


aggcggagaa 


gtttgggcca 


gatgcccaat 


atgcttgaca 


3120 


tgaaaacgtg 


agaatgaatt 


tagtattatt 


gtgatattct 


gaggcaattt 


tattataatc 


3180 


tcgaagataa 


gagaagaatg 


cagtgacctt 


tgtattgaca 


aatggagatt 


ccatgtatct 


3240 


aaaaaatacg 


cctttaggcc 


ttctgatacc 


ctttcccctg 


cggtttagcg 


tgccttttac 


3300 


attaatatct 


aaaccctctc 


cgatggtggc 


ctttaactga 


ctaataaatg 


caaccgatat 


3360 


aaactgtgat 


aattctgggt 


gatttatgat 


tcgatcgaca 


attgtattgt 


acactagtgc 


3420 


aggatcaggc 


caatccagtt 


ctttttcaat 


taccggtgtg 


tcgtctgtat 


tcagtacatg 


3480 


tccaacaaat 


gcaaatgcta 


acgttttgta 


tttcttataa 


ttgtcaggaa 


ctggaaaagt 


3540 


cccccttgtc 


gtctcgatta 


cacacctact 


ttcatcgtac 


accataggtt 


ggaagtgctg 


3600 


cataatacat 


tgcttaatac 


aagcaagcag 


tctctcgcca 


ttcatatttc 


agttattttc 


3660 
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cattacagct 


gatgtcattg 


tatatcagcg 


ctgtaaaaat 


ctatctgtta 


cagaaggttt 


3720 


tcgcggtttt 


tataaacaaa 


actttcgtta 


cgaaatcgag 


caatcacccc 


agctgcgtat 


3780 


ttggaaattc 


gggaaaaagt 


agagcaacgc 


gagttgcatt 


ttttacacca 


taatgcatga 


3840 


ttaacttcga 


gaagggatta 


aggctaattt 


cactagtatg 


tttcaaaaac 


ctcaatctgt 


3900 


ccattgaatg 


ccttataaaa 


cagctataga 


ttgcatagaa 


gagttagcta 


ctcaatgctt 


3960 


tttgtcaaag 


cttactgatg 


atgatgtgtc 


tactttcagg 


cgggtctgta 


gtaaggagaa 


4020 


tgacattata 


aagctggcac 


ttagaattcc 


acggactata 


gactatacta 


gtatactccg 


4080 


tctactgtac 


gatacacttc 


cgctcaggtc 


cttgtccttt 


aacgaggcct 


taccactctt 


4140 


ttgttactct 


attgatccag 


ctcagcaaag 


gcagtgtgat 


ctaagattct 


atcttcgcga 


4200 


tgtagtaaaa 


ctagctagac 


cgagaaagag 


actagaaatg 


caaaaggcac 


ttctacaatg 


4260 


gctgccatca 


ttattatccg 


atgtgacgct 


gcattttttt 


tttttttttt 


tttttttttt 


4320 


tttttttttt 


tttttttttt 


ttttttggta 


caaatatcat 


aaaaaaagag 


aatcttttta 


4380 


agcaaggatt 


ttcttaactt 


cttcggcgac 


agcatcaccg 


acttcggtgg 


tactgttgga 


4440 


accacctaaa 


tcaccagttc 


tgatacctgc 


atccaaaacc 


tttttaactg 


catcttcaat 


4500 


ggctttacct 


tcttcaggca 


agttcaatga 


caatttcaac 


atcattgcag 


cagacaagat 


4560 


agtggcgata 


gggttgacct 


tattctttgg 


caaatctgga 


gcggaaccat 


ggcatggttc 


4620 


gtacaaacca 


aatgcggtgt 


tcttgtctgg 


caaagaggcc 


aaggacgcag 


atggcaacaa 


4680 


acccaaggag 


cctgggataa 


cggaggcttc 


atcggagatg 


atatcaccaa 


acatgttgct 


4740 


ggtgattata 


ataccattta 


ggtgggttgg 


gttcttaact 


aggatcatgg 


cggcagaatc 


4800 


aatcaattga 


tgttgaactt 


tcaatgtagg 


gaattcgttc 


ttgatggttt 


cctccacagt 


4860 


ttttctccat 


aatcttgaag 


aggccaaaac 


attagcttta 


tccaaggacc 


aaataggcaa 


4920 


tggtggctca 


tgttgtaggg 


ccatgaaagc 


ggccattctt 


gtgattcttt 


gcacttctgg 


4960 


aacggtgtat 


tgttcactat 


cccaagcgac 


accatcacca 


tcgtcttcct 


ttctcttacc 


5040 


aaagtaaata 


cctcccacta 


attctctaac 


aacaacgaag 


tcagtacctt 


tagcaaattg 


5100 


tggcttgatt 


ggagataagt 


ctaaaagaga 


gtcggatgca 


aagttacatg 


gtcttaagtt 


5160 


ggcgtacaat 


tgaagttctt 


tacggatttt 


tagtaaacct 


tgttcaggtc 


taacactacc 


5220 


ggtaccccat 


ttaggaccac 


ccacagcacc 


taacaaaacg 


gcatcagcct 


tcttggaggc 


5280 


ttccagcgcc 


tcatctggaa 


gtggaacacc 


tgtagcatcg 


atagcagcac 


caccaattaa 


5340 


atgattttcg 


aaatcgaact 


tgacattgga 


acgaacatca 


gaaatagctt 


taagaacctt 


5400 


aatggcttcg 


gctgtgattt 


cttgaccaac 


gtggtcacct 


ggcaaaacga 


cgatcttctt 


5460 
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aggggcagac 


attacaatgg 


tatatccttg 


aaatatatat 


aaaaaaaaaa 


aaaaaaaaaa 


5520 


aaaaaaaaaa 


atgcagcttc 


tcaatgatat 


tcgaatacgc 


tttgaggaga 


tacagcctaa 


5580 


tatccgacaa 


actgttttac 


agatttacga 


tcgtacttgt 


tacccatcat 


tgaattttga 


5640 


acatccgaac 


ctgggagttt 


tccctgaaac 


agatagtata 


tttgaacctg 


tataataata 


5700 


tatagtctag 


cgctttacgg 


aagacaatgt 


atgtatttcg 


gttcctggag 


aaactattgc 


5760 


atctattgca 


taggtaatct 


tgcacgtcgc 


atccccggtt 


cattttctgc 


gtttccatct 


5820 


tgcacttcaa 


tagcatatct 


ttgttaacga 


agcatctgtg 


cttcattttg 


tagaacaaaa 


5880 


atgcaacgcg 


agagcgctaa 


tttttcaaac 


aaagaatctg 


agctgcattt 


ttacagaaca 


5940 


gaaatgcaac 


gcgaaagcgc 


tattttacca 


acgaagaatc 


tgtgcttcat 


ttttgtaaaa 


6000 


caaaaatgca 


acgcgagagc 


gctaattttt 


caaacaaaga 


atctgagctg 


catttttaca 


6060 


gaacagaaat 


gcaacgcgag 


agcgctattt 


taccaacaaa 


gaatctatac 


ttcttttttg 


6120 


ttctacaaaa 


atgcatcccg 


agagcgctat 


ttttctaaca 


aagcatctta 


gattactttt 


6180 


tttctccttt 


gtgcgctcta 


taatgcagtc 


tcttgataac 


tttttgcact 


gtaggtccgt 


6240 


taaggttaga 


agaaggctac 


tttggtgtct 


attttctctt 


ccataaaaaa 


agcctgactc 


6300 


cacttcccgc 


gtttactgat 


tactagcgaa 


gctgcgggtg 


cattttttca 


agataaaggc 


6360 


atccccgatt 


atattctata 


ccgatgtgga 


ttgcgcatac 


tttgtgaaca 


gaaagtgata 


6420 


gcgttgatga 


ttcttcattg 


gtcagaaaat 


tatgaacggt 


ttcttctatt 


ttgtctctat 


6480 


atactacgta 


taggaaatgt 


ttacattttc 


gtattgtttt 


cgattcactc 


tatgaatagt 


6540 


tcttactaca 


atttttttgt 


ctaaagagta 


atactagaga 


taaacataaa 


aaatgtagag 


6600 


gtcgagttta 


gatgcaagtt 


caaggagcga 


aaggtggatg 


ggtaggttat 


atagggatat 


6660 


agcacagaga 


tatatagcaa 


agagatactt 


ttgagcaatg 


tttgtggaag 


cggtattcgc 


6720 


aatattttag 


tagctcgtta 


cagtccggtg 


cgtttttggt 


tttttgaaag 


tgcgtcttca 


6780 


gagcgctttt 


ggttttcaaa 


agcgctctga 


agttcctata 


ctttctagag 


aataggaact 


6840 


tcggaatagg 


aacttcaaag 


cgtttccgaa 


aacgagcgct 


tccgaaaatg 


caacgcgagc 


6900 


tgcgcacata 


cagctcactg 


ttcacgtcgc 


acctatatct 


gcgtgttgcc 


tgtatatata 


6960 


tatacatgag 


aagaacggca 


tagtgcgtgt 


ttatgcttaa 


atgcgtactt 


atatgcgtct 


7020 


atttatgtag 


gatgaaaggt 


agtctagtac 


ctcctgtgat 


attatcccat 


tccatgcggg 


7080 


gtatcgtatg 


cttccttcag 


cactaccctt 


tagctgttct 


atatgctgcc 


actcctcaat 


7140 


tggattagtc 


tcatccttca 


atgctatcat 


ttcctttgat 


attggatcat 


atgcatagta 


7200 


ccgagaaact 


agtgcgaagt 


agtgatcagg 


tattgctgtt 


atctgatgag 


tatacgttgt 


7260 



112 



WO 01/38360 



PCT/US00/32326 



cctggccacg 


gcagaagcac 


gcttatcgct 


ccaatttccc 


acaacattag 


tcaactccgt 


7320 


taggcccttc 


attgaaagaa 


atgaggtcat 


caaatgtctt 


ccaatgtgag 


attttgggcc 


7380 


attttttata 


gcaaagattg 


aataaggcgc 


atttttcttc 


aaagctttat 


tgtacgatct 


7440 


gactaagtta 


tcttttaata 


attggtattc 


ctgtttattg 


cttgaagaat 


tgccggtcct 


7500 


atttactcgt 


tttaggactg 


gttcagaatt 


cctcaaaaat 


tcatccaaat 


atacaagtgg 


7560 


atcgatgata 


agctgtcaaa 


catgagaatt 


cttgaagacg 


aaagggcctc 


gtgatacgcc 


7620 


tatttttata 


ggttaatgtc 


atgataataa 


tggtttctta 


gacgtcaggt 


ggcacttttc 


7680 


ggggaaatgt 


gcgcggaacc 


cctatttgtt 


tatttttcta 


aatacattca 


aatatgtatc 


7740 


cgctcatgag 


acaataaccc 


tgataaatgc 


ttcaataata 


ttgaaaaagg 


aagagtatga 


7800 


gtattcaaca 


tttccgtgtc 


gcccttattc 


ccttttttgc 


ggcattttgc 


cttcctgttt 


7860 


ttgctcaccc 


agaaacgctg 


gtgaaagtaa 


aagatgctga 


agatcagttg 


ggtgcacgag 


7920 


tgggttacat 


cgaactggat 


ctcaacagcg 


gtaagatcct 


tgagagtttt 


cgccccgaag 


7980 


aacgttttcc 


aatgatgagc 


acttttaaag 


ttctgctatg 


tggcgcggta 


ttatcccgtg 


8040 


ttgacgccgg 


gcaagagcaa 


ctcggtcgcc 


gcatacacta 


ttctcagaat 


gacttggttg 


8100 


agtactcacc 


agtcacagaa 


aagcatctta 


cggatggcat 


gacagtaaga 


gaattatgca 


8160 


gtgctgccat 


aaccatgagt 


gataacactg 


cggccaactt 


acttctgaca 


acgatcggag 


8220 


gaccgaagga 


gctaaccgct 


tttttgcaca 


acatggggga 


tcatgtaact 


cgccttgatc 


8280 


gttgggaacc 


ggagctgaat 


gaagccatac 


caaacgacga 


gcgtgacacc 


acgatgcctg 


8340 


cagcaatggc 


aacaacgttg 


cgcaaactat 


taactggcga 


actacttact 


ctagcttccc 


8400 


ggcaacaatt 


aatagactgg 


atggaggcgg 


ataaagttgc 


aggaccactt 


ctgcgctcgg 


8460 


cccttccggc 


tggctggttt 


attgctgata 


aatctggagc 


cggtgagcgt 


gggtctcgcg 


8520 


gtatcattgc 


agcactgggg 


ccagatggta 


agccctcccg 


tatcgtagtt 


atctacacga 


8580 


cggggagtca 


ggcaactatg 


gatgaacgaa 


atagacagat 


cgctgagata 


ggtgcctcac 


8640 


tgattaagca 


ttggtaactg 


tcagaccaag 


tttactcata 


tatactttag 


attgatttaa 


8700 


aacttcattt 


ttaatttaaa 


aggatctagg 


tgaagatcct 


ttttgataat 


ctcatgacca 


8760 


aaatccctta 


acgtgagttt 


tcgttccact 


gagcgtcaga 


ccccgtagaa 


aagatcaaag 


8820 


gatcttcttg 


agatcctttt 


tttctgcgcg 


taatctgctg 


cttgcaaaca 


aaaaaaccac 


8880 


cgctaccagc 


ggtggtttgt 


ttgccggatc 


aagagctacc 


aactcttttt 


ccgaaggtaa 


8940 


ctggcttcag 


cagagcgcag 


ataccaaata 


ctgtccttct 


agtgtagccg 


tagttaggcc 


9000 


accacttcaa 


gaactctgta 


gcaccgccta 


catacctcgc 


tctgctaatc 


ctgttaccag 


9060 
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tggctgctgc 


cagtggcgat 


aagtcgtgtc 


ttaccgggtt 


ggactcaaga 


cgatagttac 


9120 


cggataaggc 


gcagcggtcg 


ggctgaacgg 


ggggttcgtg 


cacacagccc 


agcttggagc 


9180 


gaacgaccta 


caccgaactg 


agatacctac 


agcgtgagct 


atgagaaagc 


gccacgcttc 


9240 


ccgaagggag 


aaaggcggac 


aggtatccgg 


taagcggcag 


ggtcggaaca 


ggagagcgca 


9300 


cgagggagct 


tccaggggga 


aacgcctggt 


atctttatag 


tcctgtcggg 


tttcgccacc 


9360 


tctgacttga 


gcgtcgattt 


ttgtgatgct 


cgtcaggggg 


gcggagccta 


tggaaaaacg 


9420 


ccagcaacgc 


ggccttttta 


cggttcctgg 


ccttttgctg 


gccttttgct 


cacatgttct 


9480 


ttcctgcgtt 


atcccctgat 


tctgtggata 


accgtattac 


cgcctttgag 


tgagctgata 


9540 


ccgctcgccg 


cagccgaacg 


accgagcgca 


gcgagtcagt 


gagcgaggaa 


gcggaagagc 


9600 


gcctgatgcg 


gtattttctc 


cttacgcatc 


tgtgcggtat 


ttcacaccgc 


atatggtgca 


9660 


ctctcagtac 


aatctgctct 


gatgccgcat 


agttaagcca 


gtatacactc 


cgctatcgct 


9720 


acgtgactgg 


gtcatggctg 


cgccccgaca 


cccgccaaca 


cccgctgacg 


cgccctgacg 


9780 


ggcttgtctg 


ctcccggcat 


ccgcttacag 


acaagctgtg 


accgtctccg 


ggagctgcat 


9840 


gtgtcagagg 


ttttcaccgt 


catcaccgaa 


acgcgcgagg 


cagctgcggt 


aaagctcatc 


9900 


agcgtggtcg 


tgaagcgatt 


cacagatgtc 


tgcctgttca 


tccgcgtcca 


gctcgttgag 


9960 


tttctccaga 


agcgttaatg 


tctggcttct 


gataaagcgg 


gccatgttaa 


gggcggtttt 


10020 


ttcctgtttg 


gtcactgatg 


cctccgtgta 


agggggattt 


ctgttcatgg 


gggtaatgat 


10080 


accgatgaaa 


cgagagagga 


tgctcacgat 


acgggttact 


gatgatgaac 


atgcccggtt 


10140 


actggaacgt 


tgtgagggta 


aacaactggc 


ggtatggatg 


cggcgggacc 


agagaaaaat 


10200 


cactcagggt 


caatgccagc 


gcttcgttaa 


tacagatgta 


ggtgttccac 


agggtagcca 


10260 


gcagcatcct 


gcgatgcaga 


tccggaacat 


aatggtgcag 


ggcgctgact 


tccgcgtttc 


10320 


cagactttac 


gaaacacgga 


aaccgaagac 


cattcatgtt 


gttgctcagg 


tcgcagacgt 


10380 


tttgcagcag 


cagtcgcttc 


acgttcgctc 


gcgtatcggt 


gattcattct 


gctaaccagt 


10440 


aaggcaaccc 


cgccagccta 


gccgggtcct 


caacgacagg 


agcacgatca 


tgcgcacccg 


10500 


tggccaggac 


ccaacgctgc 


ccgagatgcg 


ccgcgtgcgg 


ctgctggaga 


tggcggacgc 


10560 


gatggatatg 


ttctgccaag 


ggttggtttg 


cgcattcaca 


gttctccgca 


agaattgatt 


10620 


ggctccaatt 


cttggagtgg 


tgaatccgtt 


agcgaggtgc 


cgccggcttc 


cattcaggtc 


10680 


gaggtggccc 


ggctccatgc 


accgcgacgc 


aacgcgggga 


ggcagacaag 


gtatagggcg 


10740 


gcgcctacaa 


tccatgccaa 


cccgttccat 


gtgctcgccg 


aggcggcata 


aatcgccgtg 


10800 


acgatcagcg 


gtccaatgat 


cgaagttagg 


ctggtaagag 


ccgcgagcga 


tccttgaagc 


10860 
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tgtccctgat 


ggtcgtcatc 


tacctgcctg 


gacagcatgg 


cctgcaacgc 


gggcatcccg 


10920 


atgccgccgg 


aagcgagaag 


aatcataatg 


gggaaggcca 


tccagcctcg 


cgtcgcgaac 


10980 


gccagcaaga 


cgtagcccag 


cgcgtcggcc 


gccatgccgg 


cgataatggc 


ctgcttctcg 


11040 


ccgaaacgtt 


tggtggcggg 


accagtgacg 


aaggcttgag 


cgagggcgtg 


caagattccg 


11100 


aataccgcaa 


gcgacaggcc 


gatcatcgtc 


gcgctccagc 


gaaagcggtc 


ctcgccgaaa 


11160 


atgacccaga 


gcgctgccgg 


cacctgtcct 


acgagttgca 


tgataaagaa 


gacagtcata 


11220 


agtgcggcga 


cgatagtcat 


gccccgcgcc 


caccggaagg 


agctgactgg 


gttgaaggct 


11280 


ctcaagggca 


tcggtcgagg 


atccttcaat 


atgcgcacat 


acgctgttat 


gttcaaggtc 


11340 


ccttcgttta 


agaacgaaag 


cggtcttcct 


tttgagggat 


gtttcaagtt 


gttcaaatct 


11400 


atcaaatttg 


caaatcccca 


gtctgtatct 


agagcgttga 


atcggtgatg 


cgatttgtta 


11460 


attaaattga 


tggtgtcacc 


attaccaggt 


ctagatatac 


caatggcaaa 


ctgagcacaa 


11520 


caataccagt 


ccggatcaac 


tggcaccatc 


tctcccgtag 


tctcatctaa 


tttttcttcc 


11580 


ggatgaggtt 


ccagatatac 


cgcaacacct 


ttattatggt 


ttccctgagg 


gaataataga 


11640 


atgtcccatt 


cgaaatcacc 


aattctaaac 


ctgggcgaat 


tgtatttcgg 


gtttgttaac 


11700 


tcgttccagt 


caggaatgtt 


ccacgtgaag 


ctatcttcca 


gcaaagtctc 


cacttcttca 


11760 


tcaaattgtg 


gagaatactc 


ccaatgctct 


tatctatggg 


acttccggga 


aacacagtac 


11820 


cgatacttcc 


caattcgtct 


tcagagctca 


ttgtttgttt 


gaagagacta 


atcaaagaat 


11880 


cgttttctca 


aaaaaattaa 


tatcttaact 


gatagtttga 


tcaaaggggc 


aaaacgtagg 


11940 


ggcaaacaaa 


cggaaaaatc 


gtttctcaaa 


ttttctgatg 


ccaagaactc 


taaccagtct 


12000 


tatctaaaaa 


ttgccttatg 


atccgtctct 


ccggttacag 


cctgtgtaac 


tgattaatcc 


12060 


tgcctttcta 


atcaccattc 


taatgtttta 


attaagggat 


tttgtcttca 


ttaacggctt 


12120 


tcgctcataa 


aaatgttatg 


acgttttgcc 


cgcaggcggg 


aaaccatcca 


cttcacgaga 


12180 


ctgatctcct 


ctgccggaac 


accgggcatc 


tccaacttat 


aagttggaga 


aataagagaa 


12240 


tttcagattg 


agagaatgaa 


aaaaaaaaac 


ccttagttca 


taggtccatt 


ctcttagcgc 


12300 


aactacagag 


aacaggggca 


caaacaggca 


aaaaacgggc 


acaacctcaa 


tggagtgatg 


12360 


caacctgcct 


ggagtaaatg 


atgacacaag 


gcaattgacc 


cacgcatgta 


tctatctcat 


12420 


tttcttacac 


cttctattac 


cttctgctct 


ctctgatttg 


gaaaaagctg 


aaaaaaaagg 


12480 


ttgaaaccag 


ttccctgaaa 


ttattcccct 


acttgactaa 


taagtatata 


aagacggtag 


12540 


gtattgattg 


taattctgta 


aatctatttc 


ttaaacttct 


taaattctac 


ttttatagtt 


12600 


agtctttttt 


ttagttttaa 


aacaccaaga 


acttagtttc 


gaataaacac 


acataaacaa 


12660 
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acaagcttac aaaacaaa atg get gca tat gca get cag ggc tat aag gtg 12711 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val 
15 10 

eta gta etc aac ccc tct gtt get gca aca ctg ggc ttt ggt get tac 12759 
Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr 
15 20 " 25 

atg tec aag get cat ggg ate gat cct aac ate agg ace ggg gtg aga 12807 
Met Ser Lys Ala His Gly lie Asp Pro Asn lie Arg Thr Gly Val Arg 
30 35 .40 

aca att ace act ggc age ccc ate acg tac tec acc tac ggc aag ttc 12 855 
Thr lie Thr Thr Gly Ser Pro lie Thr Tyr Ser Thr Tyr Gly Lys Phe 
45 50 55 

ctt gee gac ggc ggg tgc teg ggg ggc get tat gac ata ata att tgt 12903 
Leu Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp lie lie lie Cys 
60 65 70 75 

gac gag tgc cac tec acg gat gee aca tec ate ttg ggc att ggc act 12951 
Asp Glu Cys His Ser Thr Asp Ala Thr Ser lie Leu Gly lie Gly Thr 
80 85 90 

gtc ctt gac caa gca gag act gcg ggg gcg aga ctg gtt gtg etc gee 12999 
Val Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala 
95 100 105 

acc gee acc cct ccg ggc tec gtc act gtg ccc cat ccc aac ate gag 13047 
Thr Ala Thr Pro Pro Gly Ser Val Thr Val Pro His Pro Asn lie Glu 
110 115 120 

gag gtt get ctg tec acc acc gga gag ate cct ttt tac ggc aag get 13095 
Glu Val Ala Leu Ser Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala 
125 130 135 

ate ccc etc gaa gta ate aag ggg ggg aga cat etc ate ttc tgt cat 13143 
He Pro Leu Glu Val He Lys Gly Gly Arg His Leu He Phe Cys His 
140 145 150 155 

tea aag aag aag tgc gac gaa etc gee gca aag ctg gtc gca ttg ggc 13191 
Ser Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly 
160 165 170 

ate aat gee gtg gee tac tac cgc ggt ctt gac gtg tec gtc ate ccg 13239 
lie Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro 
175 180 185 

acc age ggc gat gtt gtc gtc gtg gca acc gat gee etc atg acc ggc 13287 
Thr Ser Gly Asp Val Val Val Val Ala Thr Asp Ala Leu Met Thr Gly 
190 195 200 

tat acc ggc gac ttc gac teg gtg ata gac tgc aat acg tgt gtc acc 13335 
Tyr Thr Gly Asp Phe Asp Ser Val lie Asp Cys Asn Thr Cys Val Thr 
205 210 215 

cag aca gtc gat ttc age ctt gac cct acc ttc acc att gag aca ate 13383 
Gin Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr lie Glu Thr He 
220 225 230 235 
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acg etc ccc caa gat get gtc tec cgc act. caa cgt egg ggc agg act 13431 
Thr Leu Pro Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr 
240 245 250 

ggc agg ggg aag cca ggc ate tac aga ttt gtg gca ccg ggg gag cgc 13479 
Gly Arg Gly Lys Pro Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg 
255 260 265 

ccc tec ggc atg ttc gac teg tec gtc etc tgt gag tgc tat gac gca 13527 
Pro Ser Gly Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala 
270 275 280 

ggc tgt get tgg tat gag etc acg ccc gee gag act aca gtt agg eta 13575 
Gly Cys Ala Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu 
285 290 295 

cga gcg tac atg aac acc ccg ggg ctt ccc gtg tgc cag gac cat ctt 13623 
Arg Ala Tyr Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu 
300 305 310 315 

gaa ttt tgg gag ggc gtc ttt aca ggc etc act cat ata gat gee cac 13671 
Glu Phe Trp Glu Gly Val Phe Thr Gly Leu Thr His He Asp Ala His 
320 325 330 

ttt eta tec cag aca aag cag agt ggg gag aac ctt cct tac ctg gta 13719 
Phe Leu Ser Gin Thr Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val 
335 340 345 

gcg tac caa gee acc gtg tgc get agg get caa gee cct ccc cca teg 13767 
Ala Tyr Gin Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser 
350 355 360 

tgg gac cag atg tgg aag tgt ttg att cgc etc aag ccc acc etc cat 13 815 
Trp Asp Gin Met Trp Lys Cys Leu He Arg Leu Lys Pro Thr Leu His 
365 370 375 

ggg cca aca ccc ctg eta tac aga ctg ggc get gtt cag aat gaa ate 13863 
Gly Pro Thr Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu He 
380 385 390 395 

acc ctg acg cac cca gtc acc aaa tac ate atg aca tgc atg teg gee 13911 
Thr Leu Thr His Pro Val Thr Lys Tyr He Met Thr Cys Met Ser Ala 
400 405 410 

gac ctg gag gtc gtc acg age acc tgg gtg etc gtt ggc ggc gtc ctg 13959 
Asp Leu Glu Val Val Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu 
415 420 425 

get get ttg gee gcg tat tgc ctg tea aca ggc tgc gtg gtc ata gtg 14007 
Ala Ala Leu Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val He Val 
430 435 440 

ggc agg gtc gtc ttg tec ggg aag ccg gca ate ata cct gac agg gaa 14055 
Gly Arg Val Val Leu Ser Gly Lys Pro Ala He He Pro Asp Arg Glu 
445 450 455 

gtc etc tac cga gag ttc gat gag atg gaa gag tgc tct cag cac tta 14103 
Val Leu Tyr Arg Glu Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu 
460 465 470 475 
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ccg tac ate gag caa ggg atg atg etc gec gag cag ttc aag cag aag 14151 
Pro Tyr lie Glu Gin Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys 
480 485 490 

gee etc ggc etc ctg cag ace gcg tec cgt cag gca gag gtt ate gee 14199 
Ala Leu Gly Leu Leu Gin Thr Ala Ser Arg Gin Ala Glu Val lie Ala 
495 500 505 

cct get gtc cag ace aac tgg caa aaa etc gag ace ttc tgg gcg aag 14247 
Pro Ala Val Gin Thr Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys 
510 515 520 

cat atg tgg aac ttc ate agt ggg ata caa tac ttg gcg ggc ttg tea 14295 
His Met Trp Asn Phe lie Ser Gly lie Gin Tyr Leu Ala Gly Leu Ser 
525 530 535 

acg ctg cct ggt aac ccc gee att get tea ttg atg get ttt aca get 14343 
Thr Leu Pro Gly Asn Pro Ala lie Ala Ser Leu Met Ala Phe Thr Ala 
540 545 550 555 

get gtc ace age cca eta ace act age caa ace etc etc ttc aac ata 14 391 
Ala Val Thr Ser Pro Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn' lie 
560 565 570 

ttg ggg ggg tgg gtg get gee cag etc gee gee ccc ggt gee get act 1443 9 
Leu Gly Gly Trp Val Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr 
575 580 585 

gee ttt gtg ggc get ggc tta get ggc gee gee ate ggc agt gtt gga 144 87 
Ala Phe Val Gly Ala Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly 
590 595 600 

ctg ggg aag gtc etc ata gac ate ctt gca ggg tat ggc gcg ggc gtg 1453 5 
Leu Gly Lys Val Leu He Asp He Leu Ala Gly Tyr Gly Ala Gly Val 
605 610 615 

gcg gga get ctt gtg gca ttc aag ate atg age ggt gag gtc ccc tec 14583 
Ala Gly Ala Leu Val Ala Phe Lys He Met Ser Gly Glu Val Pro Ser 
620 625 630 635 

acg gag gac ctg gtc aat eta ctg ccc gee ate etc teg ccc gga gee 14631 
Thr Glu Asp Leu Val Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala 
640 645 650 

etc gta gtc ggc gtg gtc tgt gca gca ata ctg cgc egg cac gtt ggc 14679 
Leu Val Val Gly Val Val Cys Ala Ala He Leu Arg Arg His Val Gly 
655 660 665 

ccg ggc gag ggg gca gtg cag tgg atg aac egg ctg ata gee ttc gee 14727 
Pro Gly Glu Gly Ala Val Gin Trp Met Asn Arg Leu He Ala Phe Ala 
670 675 680 

tec egg ggg aac cat gtt tec ccc acg cac tac gtg ccg gag age gat 14775 
Ser Arg Gly Asn His Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp 
685 690 695 

gca get gee cgc gtc act gee ata etc age age etc act gta ace cag 14823 
Ala Ala Ala Arg Val Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin 
700 705 710 715 
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etc ctg agg cga ctg cac cag tgg ata age teg gag tgt ace act cca 14871 
Leu Leu Arg Arg Leu His Gin Tip lie Ser Ser Glu Cys Thr Thr Pro 
720 725 730 

tgc tec ggt tec tgg eta agg gac ate tgg gac tgg ata tgc gag gtg 14919 
Cys Ser Gly Ser Trp Leu Arg Asp lie Trp Asp Trp He Cys Glu Val 
735 740 745 

ttg age gac ttt aag ace tgg eta aaa get aag etc atg cca cag ctg 14 967 
Leu Ser Asp Phe Lys Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu 
750 755 760 

cct ggg ate ccc ttt gtg tec tgc cag cgc ggg tat aag ggg gtc tgg 15015 
Pro Gly He Pro Phe Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp 
765 770 775 

cga ggg gac ggc ate atg cac act cgc tgc cac tgt gga get gag ate 15063 
Arg Gly Asp Gly He Met His Thr Arg Cys His Cys Gly Ala Glu He 
780 785 790 795 

act gga cat gtc aaa aac ggg acg atg agg ate gtc ggt cct agg ace 15111 
Thr Gly His Val Lys Asn Gly Thr Met Arg He Val Gly Pro Arg Thr 
800 805 810 

tgc agg aac atg tgg agt ggg acc ttc ccc att aat gec tac ace acg 15159 
Cys Arg Asn Met Trp Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr 
815 820 825 

ggc ccc tgt acc ccc ctt cct gcg ccg aac tac acg ttc gcg eta tgg 15207 
Gly Pro Cys Thr Pro Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp 
830 835 840 

agg gtg tct gca gag gaa tac gtg gag ata agg cag gtg ggg gac ttc 15255 
Arg Val Ser Ala Glu Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe 
845 850 855 

cac tac gtg acg ggt atg act act gac aat ctt aaa tgc ccg tgc cag 15303 
His Tyr Val Thr Gly Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin 
860 865 870 875 

gtc cca teg ccc gaa ttt ttc aca gaa ttg gac ggg gtg cgc eta cat 15351 
Val :Pro Ser Pro Glu Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His 
880 885 890 

agg ttt gcg ccc ccc tgc aag ccc ttg ctg egg gag gag gta tea ttc 153 99 
Arg Phe Ala Pro Pro Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe 
895 900 905 

aga gta gga etc cac gaa tac ccg gta ggg teg caa tta cct tgc gag 1544 7 
Arg Val Gly Leu His Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu 
910 915 920 

ccc gaa ccg gac gtg gee gtg ttg acg tec atg etc act gat ccc tec 15495 
Pro Glu Pro Asp Val Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser 
925 930 935 

cat ata aca gca gag gcg gee ggg cga agg ttg gcg agg gga tea ccc 15543 
His He Thr Ala Glu Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro 
940 945 950 955 
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ccc tct gtg gcc age tec teg get age cag eta tec get cca tct etc 15591 
Pro Ser Val Ala Ser Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu 
960 965 970 

aag gca act tgc ace get aac cat gac tec cct gat get gag etc ata 15639 
Lys Ala Thr Cys Thr Ala Asn His Asp Ser Pro Asp Ala Glu Leu lie 
975 980 985 

gag gcc aac etc eta tgg agg cag gag atg ggc ggc aac ate ace agg 15687 
Glu Ala Asn Leu Leu Trp Arg Gin Glu Met Gly Gly Asn lie Thr Arg 
990 995 1000 

gtt gag tea gaa aac aaa gtg gtg att ctg gac tec ttc gat ccg ctt 15735 
Val Glu Ser Glu Asn Lys Val val lie Leu Asp Ser Phe Asp Pro Leu 
1005 1010 1015 

gtg gcg gag gag gac gag egg gag ate tec gta ccc gca gaa ate ctg 15783 
Val Ala Glu Glu Asp Glu Arg Glu lie Ser Val Pro Ala Glu He Leu 
1020 1025 1030 1035 

egg aag tct egg aga ttc gcc cag gcc ctg ccc gtt tgg gcg egg ccg 15 831 
Arg Lys Ser Arg Arg Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro 
1040 1045 1050 

gac tat aac ccc ccg eta gtg gag acg tgg aaa aag ccc gac tac gaa 15879 
Asp Tyr Asn Pro Pro Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu 
1055 1060 1065 

cca cct gtg gtc cat ggc tgc ccg ctt cca cct cca aag tec cct cct 15927 
Pro Pro Val Val His Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro 
1070 1075 1080 

gtg cct ccg cct egg aag aag egg acg gtg gtc etc act gaa tea acc 15975 
Val Pro Pro Pro Arg Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr 
1085 1090 1095 

eta tct act gcc ttg gcc gag etc gcc acc aga age ttt ggc age tec 1602 3 
Leu Ser Thr Ala Leu Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser 
1100 1105 1110 1115 

tea act tec ggc att acg ggc gac aat acg aca aca tec tct gag ccc 16071 
Ser Thr Ser Gly He Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro 
1120 1125 1130 

gcc cct tct ggc tgc ccc ccc gac tec gac get gag tec tat tec tec 16119 
Ala Pro Ser Gly Cys Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser 
1135 1140 1145 

atg ccc ccc ctg gag ggg gag cct ggg gat ccg gat ctt age gac ggg 16167 
Met Pro Pro Leu Glu Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly 
1150 1155 1160 

tea tgg tea acg gtc agt agt gag gcc aac gcg gag gat gtc gtg tgc 16215 
Ser Trp Ser Thr Val Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys 
1165 1170 1175 

tgc tea atg tct tac tct tgg aca ggc gca etc gtc acc ccg tgc gcc 16263 
Cys Ser Met Ser Tyr Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala 
1180 1185 1190 1195 
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gcg gaa gaa cag aaa ctg ccc ate aat gca eta age aac teg ttg eta 
Ala Glu Glu Gin Lys Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu 
1200 1205 1210 



16311 



cgt cac cac aat ttg gtg tat tec ace ace tea cgc agt get tgc caa 
Arg His His Asn Leu Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin 
1215 1220 1225 



16359 



agg cag aag aaa gtc aca ttt gac aga ctg caa gtt ctg gac age cat 
Arg Gin Lys Lys Val Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His 
1230 1235 1240 



16407 



tac cag gac gta etc aag gag gtt aaa gca gcg gcg tea aaa gtg aag 
Tyr Gin Asp Val Leu Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys 
1245 1250 1255 



16455 



get aac ttg eta tec gta gag gaa get tgc age ctg acg ccc cca cac 
Ala Asn Leu Leu Ser Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His 
1260 1265 1270 1275 



16503 



tea gee aaa tec aag ttt ggt tat ggg gca aaa gac gtc cgt tgc cat 
Ser Ala Lys Ser Lys Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His 
1280 1285 1290 



16551 



gec aga aag gee gta ace cac ate aac tec gtg tgg aaa gac ctt ctg 
Ala Arg Lys Ala Val Thr His lie Asn Ser Val Trp Lys Asp Leu Leu 
1295 1300 1305 



16599 



gaa gac aat gta aca cca ata gac act ace ate atg get aag aac gag 
Glu Asp Asn Val Thr Pro He Asp Thr Thr He Met Ala Lys Asn Glu 
1310 1315 1320 



16647 



gtt ttc tgc gtt cag cct gag aag ggg ggt cgt aag cca get cgt etc 
Val Phe Cys Val Gin Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu 
1325 1330 1335 



16695 



ate gtg ttc ccc gat ctg ggc gtg cgc gtg tgc gaa aag atg get ttg 
He Val Phe Pro Asp Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu 
1340 1345 1350 1355 



16743 



tac gac gtg gtt aca aag etc ccc ttg gee gtg atg gga age tec tac 
Tyr Asp Val Val Thr Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr 
1360 1365 1370 



16791 



gga ttc caa tac tea cca gga cag egg gtt gaa ttc etc gtg caa gcg 
Gly Phe Gin Tyr Ser Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala 
1375 1380 1385 



16839 



tgg aag tec aag aaa ace cca atg ggg ttc teg tat gat acc cgc tgc 
Trp Lys Ser Lys Lys Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys 
1390 1395 1400 



16887 



ttt gac tec aca gtc act gag age gac ate cgt acg gag gag gca ate 
Phe Asp Ser Thr Val Thr Glu Ser Asp He Arg Thr Glu Glu Ala He 
1405 1410 1415 



16935 



tac caa tgt tgt gac etc gac ccc caa gee cgc gtg gee ate aag tec 
Tyr Gin Cys Cys Asp Leu Asp Pro Gin Ala Arg Val Ala He Lys Ser 
1420 1425 1430 * 1435 



16983 
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etc acc gag agg ctt tat gtt ggg ggc cct ctt acc aat tea agg ggg 
Leu Thr Glu Arg Leu Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly 
1440 1445 1450 



17031 



gag aac tgc ggc tat cgc agg tgc cgc gcg age ggc gta ctg aca act 
Glu Asn Cys Gly Tyr Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr 
1455 1460 1465 



17079 



age tgt ggt aac acc etc act tgc tac ate aag gee egg gca gee tgt 
Ser Cys Gly Asn Thr Leu Thr Cys Tyr lie Lys Ala Arg Ala Ala Cys 
1470 1475 1480 



17127 



cga gee gca ggg etc cag gac tgc acc atg etc gtg tgt ggc gac gac 
Arg Ala Ala Gly Leu Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp 
1485 1490 1495 



17175 



tta gtc gtt ate tgt gaa age gcg ggg gtc cag gag gac gcg gcg age 
Leu Val Val He Cys Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser 
1500 1505 1510 1515 



17223 



ctg aga gec ttc acg gag get atg acc agg tac tec gec ccc cct ggg 
Leu Arg Ala Phe Thr Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly 
1520 1525 1530 



17271 



gac ccc cca caa cca gaa tac gac ttg gag etc ata aca tea tgc tec 
Asp Pro Pro Gin Pro Glu Tyr Asp Leu Glu Leu He Thr Ser Cys Ser 
1535 1540 1545 



17319 



tec aac gtg tea gtc gee cac gac ggc get gga aag agg gtc tac tac 
Ser Asn Val Ser Val Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr 
1550 1555 1560 



17367 



etc acc cgt gac cct aca acc ccc etc gcg aga get gcg tgg gag aca 
Leu Thr Arg Asp Pro Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr 
1565 1570 1575 



17415 



gca aga cac act cca gtc aat tec tgg eta ggc aac ata ate atg ttt 
Ala Arg His Thr Pro Val Asn Ser Trp Leu Gly Asn He He Met Phe 
1580 1585 1590 1595 



17463 



gec ccc aca ctg tgg gcg agg atg ata ctg atg acc cat ttc ttt age 
Ala Pro Thr Leu Trp Ala Arg Met He Leu Met Thr His Phe Phe Ser 
1600 1605 1610 



17511 



gtc ctt ata gec agg gac cag ctt gaa cag gec etc gat tgc gag ate 
Val Leu He Ala Arg Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu lie 
1615 1620 1625 



17559 



tac ggg gee tgc tac tec ata gaa cca ctg gat eta cct cca ate att 
Tyr Gly Ala Cys Tyr Ser He Glu Pro Leu Asp Leu Pro Pro He He 
1630 1635 1640 



17607 



caa aga etc cat ggc etc age gca ttt tea etc cac agt tac tct cca 
Gin Arg Leu His Gly Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro 
1645 1650 1655 



17655 



ggt gaa ate aat agg gtg gec gca tgc etc aga aaa ctt ggg gta ccg 
Gly Glu He Asn Arg Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro 
1660 1665 1670 1675 



17703 



122 



WO 01/38360 



PCT7US00/32326 



ccc ttg cga get tgg aga cac egg gec egg age gtc cgc get agg ctt 
Pro Leu Arg Ala Trp Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu 
1680 1685 1690 



17751 



ctg gee aga gga ggc agg get gee ata tgt ggc aag tac etc ttc aac 
Leu Ala Arg Gly Gly Arg Ala Ala lie Cys Gly Lys Tyr Leu Phe Asn 
1695 1700 1705 



17799 



tgg gca gta aga aca aag etc aaa etc act cca ata gcg gee get ggc 
Trp Ala Val Arg Thr Lys Leu Lys Leu Thr Pro lie Ala Ala Ala Gly 
1710 1715 1720 



17847 



cag ctg gac ttg tec ggc tgg ttc acg get ggc tac age ggg gga gac 
Gin Leu Asp Leu Ser Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp 
1725 1730 1735 



17895 



att tat cac age gtg tct cat gee egg ccc cgc tgg ate tgg ttt tgc 
lie Tyr His Ser Val Ser His Ala Arg Pro Arg Trp lie Trp Phe Cys 
1740 1745 1750 1755 



17943 



eta etc ctg ctt get gca ggg gta ggc ate tac etc etc ccc aac cga 
Leu Leu Leu Leu Ala Ala Gly Val Gly lie Tyr Leu Leu Pro Asn Arg 
1760 1765 1770 



17991 



atg age acg aat cct aaa cct caa aga aag ace aaa cgt aac ace aac 
Met Ser Thr Asn Pro Lys Pro Gin Arg Lys Thr Lys Arg Asn Thr Asn 
1775 1780 1785 



18039 



c 99 egg ccg cag gac gtc aag ttc ccg ggt ggc ggt cag ate gtt ggt 
Arg Arg Pro Gin Asp Val Lys Phe Pro Gly Gly Gly Gin lie Val Gly 
1790 1795 1800 



18087 



gga gtt tac ttg ttg ccg cgc agg ggc cct aga ttg ggt gtg cgc gcg 
Gly Val Tyr Leu Leu Pro Arg Arg Gly Pro Arg Leu Gly Val Arg Ala 
1805 1810 1815 



18135 



acg aga aag act tec gag egg teg caa cct cga ggt aga cgt cag cct 
Thr Arg Lys Thr Ser Glu Arg Ser Gin Pro Arg Gly Arg Arg Gin Pro 
1820 1825 1830 1835 



18183 



ate ccc aag get cgt egg ccc gag ggc agg ace tgg get cag ccc ggg 
lie Pro Lys Ala Arg Arg Pro Glu Gly Arg Thr Trp Ala Gin Pro Gly 
1840 1845 1850 



18231 



tac cct tgg ccc etc tat ggc aat gag ggc tgc ggg tgg gcg gga tgg 
Tyr Pro Trp Pro Leu Tyr Gly Asn Glu Gly Cys Gly Trp Ala Gly Trp 
1855 1860 1865 



18279 



etc ctg tct ccc cgt ggc tct egg cct age tgg ggc ccc aca gac ccc 
Leu Leu Ser Pro Arg Gly Ser Arg Pro Ser Trp Gly Pro Thr Asp Pro 
1870 1875 1880 



18327 



egg cgt agg teg cgc aat ttg ggt aag gtc ate gat acc ctt acg tgc 
Arg Arg Arg Ser Arg Asn Leu Gly Lys Val lie Asp Thr Leu Thr Cys 
1885 1890 1895 



18375 



ggc ttc gec gac etc atg ggg tac ata ccg etc gtc ggc gee cct ctt 
Gly Phe Ala Asp Leu Met Gly Tyr lie Pro Leu Val Gly Ala Pro Leu 
1900 1905 1910 1915 



18423 
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gga ggc get gec agg gec ctg gcg cat ggc gtc egg gtt ctg gaa gac 18471 
Gly Gly Ala Ala Arg Ala Leu Ala His Gly Val Arg Val Leu Glu Asp 
1920 1925 1930 



ggc gtg aac tat gca aca ggg aac ctt cct ggt tgc tct taatagtcga 18520 
Gly Val Asn Tyr Ala Thr Gly Asn Leu Pro Gly Cys Ser 
1935 1940 



ctttgttccc 


actgtacttt 


tagctegtae 


aaaatacaat 


atacttttca 


tttctccgta 


18580 


aacaacatgt 


tttcccatgt 


aatatccttt 


tctatttttc 


gttccgttac 


caactttaca 


18640 


catactttat 


atagctattc 


acttctatac 


actaaaaaac 


taagacaatt 


ttaattttgc 


18700 


tgcctgccat 


atttcaattt 


gttataaatt 


cctataattt 


atcctattag 


tagctaaaaa 


18760 


aagatgaatg 


tgaatcgaat 


cctaagagaa 


ttggatctga 


tccacaggac 


gggtgtggtc 


18820 


gecatgateg 


cgtagtcgat 


agtggctcca 


agtagcgaag 


cgagcaggac 


tgggeggegg 


18880 


ecaaageggt 


cggacagtgc 


tccgagaacg 


ggtgcgcata 


gaaattgeat 


caaegcatat 


18940 


agegctagea 


gcacgccata 


gtgactggcg 


atgctgtcgg 


aatggacgat 


atcccgcaag 


19000 


aggcccggca 


gtaceggcat 


aaccaagcct 


atgcctacag 


catccagggt 


gacggtgccg 


19060 


aggatgacga 


tgagegcatt 


gttagatttc 


atacaeggtg 


cctgactgcg 


ttagcaattt 


19120 


aactgtgata 


aactaccgca 


ttaaagcttt 


ttctttccaa 


tttttttttt 


ttegtcatta 


19180 


taaaaatcat 


tacgaccgag 


attccegggt 


aataactgat 


ataattaaat 


tgaagctcta 


19240 


atttgtgagt 


ttagtataca 


tgcatttact 


tataatacag 


ttttttagtt 


ttgctggccg 


19300 


catcttctca 


aatatgette 


ccagcctgct 


tttctgtaac 


gttcaccctc 


taccttagca 


19360 


tcccttccct 


ttgeaaatag 


tcctcttcca 


acaataataa 


tgtcagatcc 


tgtagagacc 


19420 


acatcatcca 


eggttctata 


ctgttgaccc 


aatgegtetc 


ccttgtcatc 


taaacccaca 


19480 


cc gggtgtca 


taatcaacca 


ategtaaect 


tcatctcttc 


cacccatgtc 


tctttgagca 


19540 


ataaagcega 


taacaaaatc 


tttgtcgctc 


ttcgcaatgt 


caacagtacc 


cttagtatat 


19600 


tctccagtag 


atagggagee 


ettgeatgae 


aattctgeta 


acatcaaaag 


gectctaggt 


19660 


tcctttgtta 


cttcttctgc 


cgcctgcttc 


aaaccgctaa 


caatacctgg 


gcccaccaca 


19720 


ccgtgtgcat 


tcgtaatgtc 


tgcccattct 


gctattctgt 


atacacccgc 


agagtactgc 


19780 


aatttgactg 


tattaccaat 


gtcagcaaat 


tttctgtctt 


cgaagagtaa 


aaaattgtac 


19840 


ttggcggata 


atgectttag 


eggcttaact 


gtgccctcca 


tggaaaaatc 


agtcaagata 


19900 


tccacatgtg 


tttttagtaa 


acaaattttg 


ggacctaatg 


cttcaactaa 


ctccagtaat 


19960 


tccttggtgg 


tacgaacatc 


caatgaagca 


cacaagtttg 


tttgetttte 


gtgeatgata 


20020 


ttaaatagct 


tggcagcaac 


aggactagga 


tgagtagcag 


cacgttcctt 


atatgtagct 


20080 
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ttcgacatga tttatcttcg tttcctgcag gtttttgttc tgtgcagttg ggttaagaat 20140 
actgggcaat ttcatgtttc ttcaacacta catatgcgta tatataccaa tctaagtctg 20200 
tgctccttcc ttcgttcttc cttctgttcg gagattaccg aatcaaaaaa atttcaagga 20260 
aaccgaaatc aaaaaaaaga ataaaaaaaa aatgatgaat tgaaaagctt atcgat 20316 



<210> 15 
<211> 1944 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd. delta. NS3NS5.pj .corel73 

<400> 15 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu Asn Pro 
15 10 15 

Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys Ala His 
20 25 30 

Gly lie Asp Pro Asn lie Arg Thr Gly Val Arg Thr lie Thr Thr Gly 
35 40 45 

Ser Pro lie Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp Gly Gly 
50 55 60 

Cys Ser Gly Gly Ala Tyr Asp lie lie He Cys Asp Glu Cys His Ser 
65 70 75 80 

Thr Asp Ala Thr Ser He Leu Gly He Gly Thr Val Leu Asp Gin Ala 
85 90 95 

Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr Pro Pro 
100 105 110 

Gly Ser Val Thr Val Pro His Pro Asn He Glu Glu Val Ala Leu Ser 
115 120 125. 

Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala He Pro Leu Glu Val 
130 135 140 

He Lys Gly Gly Arg His Leu He Phe Cys His Ser Lys Lys Lys Cys 
145 150 155 160 

Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly He Asn Ala Val Ala 
165 170 175 

Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro Thr Ser Gly Asp Val 
180 185 190 

Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe 
195 200 205 

Asp Ser Val He Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe 

125 



WO 01/38360 



PCT/US00/32326 



210 215 220 

Ser Leu Asp Pro Thr Phe Thr lie Glu Thr lie Thr Leu Pro Gin Asp 
225 230 235 240 

Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly Lys Pro 
245 250 255 

Gly lie Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly Met Phe 
260 265 270 

Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala Trp Tyr 
275 280 285 

Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr Met Asn 
290 295 300 

Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp Glu Gly 
305 310 315 320 

Val Phe Thr Gly Leu Thr His He Asp Ala His Phe Leu Ser Gin Thr 
325 330 335 

Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin Ala Thr 
340 345 350 

Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin Met Trp 
355 360 365 

Lys Cys Leu He Arg Leu Lys Pro Thr Leu His Gly Pro Thr Pro Leu 
370 375 380 

Leu Tyr Arg Leu Gly Ala val Gin Asn Glu He Thr Leu Thr His Pro 
385 390 395 400 

Val Thr Lys Tyr He Met Thr Cys Met Ser Ala Asp Leu Glu Val Val 
405 410 415 

Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu Ala Ala 
420 425 430 

Tyr Cys Leu Ser Thr Gly Cys Val Val He Val Gly Arg Val Val Leu 
435 440 445 

Ser Gly Lys Pro Ala He lie Pro Asp Arg Glu Val Leu Tyr Arg Glu 
450 455 460 

Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr He Glu Gin 
465 470 475 480 

Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly Leu Leu 
485 490 495 

Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala Pro Ala Val Gin Thr 
500 505 510 

Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp Asn Phe 
515 520 525 
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lie Ser Gly lie Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro Gly Asn 
530 535 540 

Pro Ala lie Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr Ser Pro 
545 550 555 560 

Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He Leu Gly Gly Trp Val 
565 570 * ' 575 

Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val Gly Ala 
580 585 590 

Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly Leu Gly Lys Val Leu 
595 600 605 

He Asp He Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala Leu Val 
610 615 620 

Ala Phe Lys He Met Ser Gly Glu Val Pro Ser Thr Glu Asp Leu Val 
625 630 635 640 

Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala Leu Val Val Gly Val 
645 650 655 

Val Cys Ala Ala He Leu Arg Arg His Val Gly Pro Gly Glu Gly Ala 
660 665 670 

Val Gin Trp Met Asn Arg Leu He Ala Phe Ala Ser Arg Gly Asn His 
675 680 685 

Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala Arg Val 
690 695 700 

Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg Arg Leu 
705 710 715 720 

His Gin Trp He Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly Ser Trp 
725 730 735 

Leu Arg Asp He Trp Asp Trp He Cys Glu Val Leu Ser Asp Phe Lys 
740 745 750 

Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly He Pro Phe 
755 760 765 

Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp Gly He 
770 775 780 

Met His Thr Arg Cys His Cys Gly Ala Glu He Thr Gly His Val Lys 
785 790 795 800 

Asn Gly Thr Met Arg He Val Gly Pro Arg Thr Cys Arg Asn Met Trp 
805 810 815 

Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr Gly Pro Cys Thr Pro 
820 825 830 



Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser Ala Glu 
835 840 845 
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Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe His Tyr Val Thr Gly 
850 855 860 

Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser Pro Glu 
865 870 875 880 

Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala Pro Pro 
885 890 895 

Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly Leu His 
900 905 910 

Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro Asp Val 
915 920 ' 925 

Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His He Thr Ala Glu 
930 935 940 

Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val Ala Ser 
945 950 955 960 

Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr Cys Thr 
965 970 975 

Ala Asn His Asp Ser Pro Asp Ala Glu Leu He Glu Ala Asn Leu Leu 
980 985 990 

Trp Arg Gin Glu Met Gly Gly Asn He Thr Arg Val Glu Ser Glu Asn 
995 1000 1005 

Lys Val Val He Leu Asp Ser Phe Asp Pro Leu Val Ala Glu Glu Asp 
1010 1015 1020 

Glu Arg Glu He Ser Val Pro Ala Glu lie Leu Arg Lys Ser Arg Arg 
025 1030 1035 1040 

Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn Pro Pro 
1045 1050 1055 

Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val Val His 
1060 1065 1070 

Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro Pro Arg 
1075 1080 1085 

Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr Ala Leu 
1090 1095 1100 

Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser Gly He 
105 1110 1115 1120 

Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser Gly Cys 
1125 1130 1135 

Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro Leu Glu 
1140 1145 1150 

Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser Thr Val 
1155 1160 1165 
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Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met Ser Tyr 
1170 1175 1180 

Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu Gin Lys 
185 1190 1195 1200 

Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu Arg His His Asn Leu 
1205 1210 1215 

Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys Lys Val 
1220 1225 1230 

Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp Val Leu 
1235 1240 1245 

Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu Leu Ser 
1250 1255 1260 

Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys Ser Lys 
265 1270 1275 1280 

Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys Ala Val 
1285 1290 . 1295 

Thr His He Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn Val Thr 
1300 1305 1310 

Pro He Asp Thr Thr He Met Ala Lys Asn Glu Val Phe Cys Val Gin 
1315 1320 1325 

Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu He Val Phe Pro Asp 
1330 1335 1340 

Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val Val Thr 
345 1350 1355 1360 

Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin Tyr Ser 
1365 1370 1375 

Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser Lys Lys 
1380 1385 1390 

Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val 
1395 1400 1405 

Thr Glu Ser Asp He Arg Thr Glu Glu Ala He Tyr Gin Cys Cys Asp 
1410 1415 1420 

Leu Asp Pro Gin Ala Arg Val Ala He Lys Ser Leu Thr Glu Arg Leu 
425 1430 1435 1440 

Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr 
1445 1450 1455 

Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
1460 1465 1470 

Leu Thr Cys Tyr He Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
1475 1480 1485 
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Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val lie Cys 
1490 1495 1500 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe Thr 
505 1510 1515 1520 

Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro Gin Pro 
1525 1530 1535 

Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser Ser Asn Val Ser Val 
1540 1545 1550 

Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg Asp Pro 
1555 1560 1565 

Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His Thr Pro 
1570 1575 1580 

Val Asn Ser Trp Leu Gly Asn lie lie Met Phe Ala Pro Thr Leu Trp 
585 1590 1595 1600 

Ala Arg Met lie Leu Met Thr His Phe Phe Ser Val Leu lie Ala Arg 
1605 1610 1615 

Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu lie Tyr Gly Ala Cys Tyr 
1620 1625 1630 

Ser He Glu Pro Leu Asp Leu Pro Pro He He Gin Arg Leu His Gly 
1635 1640 1645 

Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu He Asn Arg 
1650 1655 1660 

Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg Ala Trp 
665 1670 1675 1680 

Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg Gly Gly 
1685 1690 1695 

Arg Ala Ala He Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val Arg Thr 
1700 1705 1710 

Lys Leu Lys Leu Thr Pro lie Ala Ala Ala Gly Gin Leu Asp Leu Ser 
1715 1720 1725 

Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp He Tyr His Ser Val 
1730 1735 1740 

Ser His Ala Arg Pro Arg Trp He Trp Phe Cys Leu Leu Leu Leu Ala 
745 1750 1755 1760 

Ala Gly Val Gly He Tyr Leu Leu Pro Asn Arg Met Ser Thr Asn Pro 
1765 1770 1775 

Lys Pro Gin Arg Lys Thr Lys Arg Asn Thr Asn Arg Arg Pro Gin Asp 
1780 1785 1790 

Val Lys Phe Pro Gly Gly Gly Gin He Val Gly Gly Val Tyr Leu Leu 
1795 1800 1805 
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Pro Arg Arg Gly Pro Arg Leu Gly Val Arg Ala Thr Arg Lys Thr Ser 
1810 1815 1820 

Glu Arg Ser Gin Pro Arg Gly Arg Arg Gin Pro lie Pro Lys Ala Arg 
825 1830 1835 1840 

Arg Pro Glu Gly Arg Thr Trp Ala Gin Pro Gly Tyr Pro Trp Pro Leu 
1845 1850 1855 

Tyr Gly Asn Glu Gly Cys Gly Trp Ala Gly Trp Leu Leu Ser Pro Arg 
1860 1865 1870 

Gly Ser Arg Pro Ser Trp Gly Pro Thr Asp Pro Arg Arg Arg Ser Arg 
1875 1880 1885 

Asn Leu Gly Lys Val lie Asp Thr Leu Thr Cys Gly Phe Ala Asp Leu 
1890 1895 1900 

Met Gly Tyr lie Pro Leu Val Gly Ala Pro Leu Gly Gly Ala Ala Arg 
905 1910 1915 1920 

Ala Leu Ala His Gly Val Arg Val Leu Glu Asp Gly Val Asn Tyr Ala 
1925 1930 " 1935 

Thr Gly Asn Leu Pro Gly Cys Ser 
1940 



<210> 16 

<211> 20217 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd. delta. NS3NS5.pj .corel40 

<220> 

<221> CDS 

<222> (12679) . . (18411) 

<400> 16 



atcgatccta ccccttgcgc 


taaagaagta 


tatgtgccta 


ctaacgcttg tctttgtctc 


60 


tgtcactaaa cactggatta 


ttactcccag 


atacttattt 


tggactaatt taaatgattt 


120 


cggatcaacg ttcttaatat 


cgctgaatct 


tccacaattg 


atgaaagtag ctaggaagag 


180 


gaattggtat aaagtttttg 


tttttgtaaa 


tctcgaagta 


tactcaaacg aatttagtat 


240 


tttctcagtg atctcccaga 


tgctttcacc 


ctcacttaga 


agtgctttaa gcattttttt 


300 


actgtggcta tttcccttat 


ctgcttcttc 


cgatgattcg 


aactgtaatt gcaaactact 


360 


tacaatatca gtgatatcag 


attgatgttt 


ttgtccatag 


taaggaataa ttgtaaattc 


420 


ccaagcagga atcaatttct 


ttaatgaggc 


ttccagaatt 


gttgcttttt gcgtcttgta 


460 


tttaaactgg agtgatttat 


tgacaatatc 


gaaactcagc 


gaattgctta tgatagtatt 


540 
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atagctcatg 


aatgtggctc 


tcttgattgc 


tgttccgtta 


tgtgtaatca 


tccaacataa 


600 


ataggttagt 


tcagcagcac 


ataatgctat 


tttctcacct 


gaaggtcttt 


caaacctttc 


660 


cacaaactga 


cgaacaagca 


ccttaggtgg 


tgttttacat 


aatatatcaa 


attgtggcat 


720 


gcttagcgcc 


gatcttgtgt 


gcaattgata 


tctagtttca 


actactctat 


ttatcttgta 


780 


tcttgcagta 


ttcaaacacg 


ctaactcgaa 


aaactaactt 


taattgtcct 


gtttgtctcg 


840 


cgttctttcg 


aaaaatgcac 


cggccgcgca 


ttatttgtac 


tgcgaaaata 


attggtactg 


900 


cggtatcttc 


atttcatatt 


ttaaaaatgc 


acctttgctg 


cttttcctta 


atttttagac 


960 


ggcccgcagg 


ttcgttttgc 


ggtactatct 


tgtgataaaa 


agttgttttg 


acatgtgatc 


1020 


tgcacagatt 


ttataatgta 


ataagcaaga 


atacattatc 


aaacgaacaa 


tactggtaaa 


1080 


agaaaaccaa 


aatggacgac 


attgaaacag 


ccaagaatct 


gacggtaaaa 


gcacgtacag 


1140 


cttatagcgt 


ctgggatgta 


tgtcggctgt 


ttattgaaat 


gattgctcct 


gatgtagata 


1200 


ttgatataga 


gagtaaacgt 


aagtctgatg 


agctactctt 


tccaggatat 


gtcataaggc 


1260 


ccatggaatc 


tctcacaacc 


ggtaggccgt 


atggtcttga 


ttctagcgca 


gaagattcca 


1320 


gcgtatcttc 


tgactccagt 


gctgaggtaa 


ttttgcctgc 


tgcgaagatg 


gttaaggaaa 


1380 


ggtttgattc 


gattggaaat 


ggtatgctct 


cttcacaaga 


agcaagtcag 


gctgccatag 


1440 


atttgatgct 


acagaataac 


aagctgttag 


acaatagaaa 


gcaactatac 


aaatctattg 


1500 


ctataataat 


aggaagattg 


cccgagaaag 


acaagaagag 


agctaccgaa 


atgctcatga 


1560 


gaaaaatgga 


ttgtacacag 


ttattagtcc 


caccagctcc 


aacggaagaa 


gatgttatga 


1620 


agctcgtaag 


cgtcgttacc 


caattgctta 


ctttagttcc 


accagatcgt 


caagctgctt 


1680 


taataggtga 


tttattcatc 


ccggaatctc 


taaaggatat 


attcaatagt 


ttcaatgaac 


1740 


tggcggcaga 


gaatcgttta 


cagcaaaaaa 


agagtgagtt 


ggaaggaagg 


actgaagtga 


1800 


accatgctaa 


tacaaatgaa 


gaagttccct 


ccaggcgaac 


aagaagtaga 


gacacaaatg 


1860 


caagaggagc 


atataaatta 


caaaacacca 


tcactgaggg 


ccctaaagcg 


gttcccacga 


1920 


aaaaaaggag 


agtagcaacg 


agggtaaggg 


gcagaaaatc 


acgtaatact 


tctagggtat 


1980 


gatccaatat 


caaaggaaat 


gatagcattg 


aaggatgaga 


ctaatccaat 


tgaggagtgg 


2040 


cagcatatag 


aacagctaaa 


gggtagtgct 


gaaggaagca 


tacgataccc 


cgcatggaat 


2100 


gggataatat 


cacaggaggt 


actagactac 


ctttcatcct 


acataaatag 


acgcatataa 


2160 


gtacgcattt 


aagcataaac 


acgcactatg 


ccgttcttct 


catgtatata 


tatatacagg 


2220 


caacacgcag 


atataggtgc 


gacgtgaaca 


gtgagctgta 


tgtgcgcagc 


tcgcgttgca 


2280 


ttttcggaag 


cgctcgtttt 


cggaaacgct 


ttgaagttcc 


tattccgaag 


ttcctattct 


2340 
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ctagaaagta 


taggaacttc 


agagcgcttt 


tgaaaaccaa 


aagcgctctg 


aagacgcact 


2400 


ttcaaaaaac 


caaaaacgca 


ccggactgta 


acgagctact 


aaaatattgc 


gaataccgct 


2460 


tccacaaaca 


ttgctcaaaa 


gtatctcttt 


gctatatatc 


tctgtgctat 


atccctatat 


2520 


aacctaccca 


tccacctttc 


gctccttgaa 


cttgcatcta 


aactcgacct 


ctacatcaac 


2580 


aggcttccaa 


tgctcttcaa 


attttactgt 


caagtagacc 


catacggctg 


taatatgctg 


2640 


ctcttcataa 


tgtaagctta 


tctttatcga 


atcgtgtgaa 


aaactactac 


cgcgataaac 


2700 


ctttacggtt 


ccctgagatt 


gaattagttc 


ctttagtata 


tgatacaaga 


cacttttgaa 


2760 


ctttgtacga 


cgaattttga 


ggttcgccat 


cctctggcta 


tttccaatta 


tcctgtcggc 


2820 


tattatctcc 


gcctcagttt 


gatcttccgc 


ttcagactgc 


catttttcac 


ataatgaatc 


2880 


tatttcaccc 


cacaatcctt 


catccgcctc 


cgcatcttgt 


tccgttaaac 


tattgacttc 


2940 


atgttgtaca 


ttgtttagtt 


cacgagaagg 


gtcctcttca 


ggcggtagct 


cctgatctcc 


3000 


tatatgacct 


ttatcctgtt 


ctctttccac 


aaacttagaa 


atgtattcat 


gaattatgga 


3060 


gcacctaata 


acattcttca 


aggcggagaa 


gtttgggcca 


gatgcccaat 


atgcttgaca 


3120 


tgaaaacgtg 


agaatgaatt 


tagtattatt 


gtgatattct 


gaggcaattt 


tattataatc 


3180 


tcgaagataa 


gagaagaatg 


cagtgacctt 


tgtattgaca 


aatggagatt 


ccatgtatct 


3240 


aaaaaatacg 


cctttaggcc 


ttctgatacc 


ctttcccctg 


cggtttagcg 


tgccttttac 


3300 


attaatatct 


aaaccctctc 


cgatggtggc 


ctttaactga 


ctaataaatg 


caaccgatat 


3360 


aaactgtgat 


aattctgggt 


gatttatgat 


tcgatcgaca 


attgtattgt 


acactagtgc 


3420 


aggatcaggc 


caatccagtt 


ctttttcaat 


taccggtgtg 


tcgtctgtat 


tcagtacatg 


3480 


tccaacaaat 


gcaaatgcta 


acgttttgta 


tttcttataa 


ttgtcaggaa 


ctggaaaagt 


3540 


cccccttgtc 


gtctcgatta 


cacacctact 


ttcatcgtac 


accataggtt 


ggaagtgctg 


3600 


cataatacat 


tgcttaatac 


aagcaagcag 


tctctcgcca 


ttcatatttc 


agttattttc 


3660 


cattacagct 


gatgtcattg 


tatatcagcg 


ctgtaaaaat 


ctatctgtta 


cagaaggttt 


3720 


tcgcggtttt 


tataaacaaa 


actttcgtta 


cgaaatcgag 


caatcacccc 


agctgcgtat 


3780 


ttggaaattc 


gggaaaaagt 


agagcaacgc 


gagttgcatt 


ttttacacca 


taatgcatga 


3840 


ttaacttcga 


gaagggatta 


aggctaattt 


cactagtatg 


tttcaaaaac 


ctcaatctgt 


3900 


ccattgaatg 


ccttataaaa 


cagctataga 


ttgcatagaa 


gagttagcta 


ctcaatgctt 


3960 


tttgtcaaag 


cttactgatg 


atgatgtgtc 


tactttcagg 


cgggtctgta 


gtaaggagaa 


4020 


tgacattata 


aagctggcac 


ttagaattcc 


acggactata 


gactatacta 


gtatactccg 


4080 


tctactgtac 


gatacacttc 


cgctcaggtc 


cttgtccttt 


aacgaggcct 


taccactctt 


4140 
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ttgttactct 


attgatccag 


ctcagcaaag 


gcagtgtgat 


ctaagattct 


atcttcgcga 


4200 


tgtagtaaaa 


ctagctagac 


cgagaaagag 


actagaaatg 


caaaaggcac 


ttctacaatg 


4260 


gctgccatca 


ttattatccg 


atgtgacgct 


gcattttttt 


tttttttttt 


tttttttttt 


4320 


tttttttttt 


tttttttttt 


ttttttggta 


caaatatcat 


aaaaaaagag 


aatcttttta 


4380 


agcaaggatt 


ttcttaactt 


cttcggcgac 


agcatcaccg 


acttcggtgg 


tactgttgga 


4440 


accacctaaa 


tcaccagttc 


tgatacctgc 


atccaaaacc 


tttttaactg 


catcttcaat 


4500 


ggctttacct 


tcttcaggca 


agttcaatga 


caatttcaac 


atcattgcag 


cagacaagat 


4560 


agtggcgata 


gggttgacct 


tattctttgg 


caaatctgga 


gcggaaccat 


ggcatggttc 


4620 


gtacaaacca 


aatgcggtgt 


tcttgtctgg 


caaagaggcc 


aaggacgcag 


atggcaacaa 


4680 


acccaaggag 


cctgggataa 


cggaggcttc 


atcggagatg 


atatcaccaa 


acatgttgct 


4740 


ggtgattata 


ataccattta 


ggtgggttgg 


gttcttaact 


aggatcatgg 


cggcagaatc 


4800 


aatcaattga 


tgttgaactt 


tcaatgtagg 


gaattcgttc 


ttgatggttt 


cctccacagt 


4860 


ttttctccat 


aatcttgaag 


aggccaaaac 


attagcttta 


tccaaggacc 


aaataggcaa 


4920 


tggtggctca 


tgttgtaggg 


ccatgaaagc 


ggccattctt 


gtgattcttt 


gcacttctgg 


4980 


aacggtgtat 


tgttcactat 


cccaagcgac 


accatcacca 


tcgtcttcct 


ttctcttacc 


5040 


aaagtaaata 


cctcccacta 


attctctaac 


aacaacgaag 


tcagtacctt 


tagcaaattg 


5100 


tggcttgatt 


ggagataagt 


ctaaaagaga 


gtcggatgca 


aagttacatg 


gtcttaagtt 


5160 


ggcgtacaat 


tgaagttctt 


tacggatttt 


tagtaaacct 


tgttcaggtc 


taacactacc 


5220 


ggtaccccat 


ttaggaccac 


ccacagcacc 


taacaaaacg 


gcatcagcct 


tcttggaggc 


5280 


ttccagcgcc 


tcatctggaa 


gtggaacacc 


tgtagcatcg 


atagcagcac 


caccaattaa 


5340 


atgattttcg 


aaatcgaact 


tgacattgga 


acgaacatca 


gaaatagctt 


taagaacctt 


5400 


aatggcttcg 


gctgtgattt 


cttgaccaac 


gtggtcacct 


ggcaaaacga 


cgatcttctt 


5460 


aggggcagac 


attacaatgg 


tatatccttg 


aaatatatat 


aaaaaaaaaa 


aaaaaaaaaa 


5520 


aaaaaaaaaa 


atgcagcttc 


tcaatgatat 


tcgaatacgc 


tttgaggaga 


tacagcctaa 


5580 


tatccgacaa 


actgttttac 


agatttacga 


tcgtacttgt 


tacccatcat 


tgaattttga 


5640 


acatccgaac 


ctgggagttt 


tccctgaaac 


agatagtata 


tttgaacctg 


tataataata 


5700 


tatagtctag 


cgctttacgg 


aagacaatgt 


atgtatttcg 


gttcctggag 


aaactattgc 


5760 


atctattgca 


taggtaatct 


tgcacgtcgc 


atccccggtt 


cattttctgc 


gtttccatct 


5820 


tgcacttcaa 


tagcatatct 


ttgttaacga 


agcatctgtg 


cttcattttg 


tagaacaaaa 


5880 


atgcaacgcg 


agagcgctaa 


tttttcaaac 


aaagaatctg 


agctgcattt 


ttacagaaca 


5940 
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gaaatgcaac 


gcgaaagcgc 


tattttacca 


acgaagaatc 


tgtgcttcat 


ttttgtaaaa 


6000 


caaaaatgca 


acgcgagagc 


gctaattttt 


caaacaaaga 


atctgagctg 


catttttaca 


6060 


gaacagaaat 


gcaacgcgag 


agcgctattt 


taccaacaaa 


gaatctatac 


ttcttttttg 


6120 


ttctacaaaa 


atgcatcccg 


agagcgctat 


ttttctaaca 


aagcatctta 


gattactttt 


6180 


tttctccttt 


gtgcgctcta 


taatgcagtc 


tcttgataac 


tttttgcact 


gtaggtccgt 


6240 


taaggttaga 


agaaggctac 


tttggtgtct 


attttctctt 


ccataaaaaa 


agcctgactc 


6300 


cacttcccgc 


gtttactgat 


tactagcgaa 


gctgcgggtg 


cattttttca 


agataaaggc 


6360 


atccccgatt 


atattctata 


ccgatgtgga 


ttgcgcatac 


tttgtgaaca 


gaaagtgata 


6420 


gcgttgatga 


ttcttcattg 


gtcagaaaat 


tatgaacggt 


ttcttctatt 


ttgtctctat 


6480 


atactacgta 


taggaaatgt 


ttacattttc 


gtattgtttt 


cgattcactc 


tatgaatagt 


6540 


tcttactaca 


atttttttgt 


ctaaagagta 


atactagaga 


taaacataaa 


aaatgtagag 


6600 


gtcgagttta 


gatgcaagtt 


caaggagcga 


aaggtggatg 


ggtaggttat 


atagggatat 


6660 


agcacagaga 


tatatagcaa 


agagatactt 


ttgagcaatg 


tttgtggaag 


cggtattcgc 


6720 


aatattttag 


tagctcgtta 


cagtccggtg 


cgtttttggt 


tttttgaaag 


tgcgtcttca 


6780 


gagcgctttt 


ggttttcaaa 


agcgctctga 


agttcctata 


ctttctagag 


aataggaact 


6840 


tcggaatagg 


aacttcaaag 


cgtttccgaa 


aacgagcgct 


tccgaaaatg 


caacgcgagc 


6900 


tgcgcacata 


cagctcactg 


ttcacgtcgc 


acctatatct 


gcgtgttgcc 


tgtatatata 


6960 


tatacatgag 


aagaacggca 


tagtgcgtgt 


ttatgcttaa 


atgcgtactt 


atatgcgtct 


7020 


atttatgtag 


gatgaaaggt 


agtctagtac 


ctcctgtgat 


attatcccat 


tccatgcggg 


7080 


gtatcgtatg 


cttccttcag 


cactaccctt 


tagctgttct 


atatgctgcc 


actcctcaat 


7140 


tggattagtc 


tcatccttca 


atgctatcat 


ttcctttgat 


attggatcat 


atgcatagta 


7200 


ccgagaaact 


agtgcgaagt 


agtgatcagg 


tattgctgtt 


atctgatgag 


tatacgttgt 


7260 


cctggccacg 


gcagaagcac 


gcttatcgct 


ccaatttccc 


acaacattag 


tcaactccgt 


7320 


taggcccttc 


attgaaagaa 


atgaggtcat 


caaatgtctt 


ccaatgtgag 


attttgggcc 


7380 


attttttata 


gcaaagattg 


aataaggcgc 


atttttcttc 


aaagctttat 


tgtacgatct 


7440 


gactaagtta 


tcttttaata 


attggtattc 


ctgtttattg 


cttgaagaat 


tgccggtcct 


7500 


atttactcgt 


tttaggactg 


gttcagaatt 


cctcaaaaat 


tcatccaaat 


atacaagtgg 


7560 


atcgatgata 


agctgtcaaa 


catgagaatt 


cttgaagacg 


aaagggcctc 


gtgatacgcc 


7620 


tatttttata 


ggttaatgtc 


atgataataa 


tggtttctta 


gacgtcaggt 


ggcacttttc 


7680 


ggggaaatgt 


gcgcggaacc 


cctatttgtt 


tatttttcta 


aatacattca 


aatatgtatc 


7740 
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cgctcatgag 


acaataaccc 


tgataaatgc 


ttcaataata 


ttgaaaaagg 


aagagtatga 


7800 


gtattcaaca 


tttccgtgtc 


gcccttattc 


ccttttttgc 


ggcattttgc 


cttcctgttt 


7860 


ttgctcaccc 


agaaacgctg 


gtgaaagtaa 


aagatgctga 


agatcagttg 


ggtgcacgag 


7920 


tgggttacat 


cgaactggat 


ctcaacagcg 


gtaagatcct 


tgagagtttt 


cgccccgaag 


7980 


aacgttttcc 


aatgatgagc 


acttttaaag 


ttctgctatg 


tggcgcggta 


ttatcccgtg 


8040 


ttgacgccgg 


gcaagagcaa 


ctcggtcgcc 


gcatacacta 


ttctcagaat 


gacttggttg 


8100 


agtactcacc 


agtcacagaa 


aagcatctta 


cggatggcat 


gacagtaaga 


gaattatgca 


8160 


gtgctgccat 


aaccatgagt 


gataacactg 


cggccaactt 


acttctgaca 


acgatcggag 


8220 


gaccgaagga 


gctaaccgct 


tttttgcaca 


acatggggga 


tcatgtaact 


cgccttgatc 


8280 


gttgggaacc 


ggagctgaat 


gaagccatac 


caaacgacga 


gcgtgacacc 


acgatgcctg 


8340 


cagcaatggc 


aacaacgttg 


cgcaaactat 


taactggcga 


actacttact 


ctagcttccc 


8400 


ggcaacaatt 


aatagactgg 


atggaggcgg 


ataaagttgc 


aggaccactt 


ctgcgctcgg 


8460 


cccttccggc 


tggctggttt 


attgctgata 


aatctggagc 


cggtgagcgt 


gggtctcgcg 


8520 


gtatcattgc 


agcactgggg 


ccagatggta 


agccctcccg 


tatcgtagtt 


atctacacga 


8580 


cggggagtca 


ggcaactatg 


gatgaacgaa 


atagacagat 


cgctgagata 


ggtgcctcac 


8640 


tgattaagca 


ttggtaactg 


tcagaccaag 


tttactcata 


tatactttag 


attgatttaa 


8700 


aacttcattt 


ttaatttaaa 


aggatctagg 


tgaagatcct 


ttttgataat 


ctcatgacca 


8760 


aaatccctta 


acgtgagttt 


tcgttccact 


gagcgtcaga 


ccccgtagaa 


aagatcaaag 


8820 


gatcttcttg 


agatcctttt 


tttctgcgcg 


taatctgctg 


cttgcaaaca 


aaaaaaccac 


8880 


cgctaccagc 


ggtggtttgt 


ttgccggatc 


aagagctacc 


aactcttttt 


ccgaaggtaa 


8940 


ctggcttcag 


cagagcgcag 


ataccaaata 


ctgtccttct 


agtgtagccg 


tagttaggcc 


9000 


accacttcaa 


gaactctgta 


gcaccgccta 


catacctcgc 


tctgctaatc 


ctgttaccag 


9060 


tggctgctgc 


cagtggcgat 


aagtcgtgtc 


ttaccgggtt 


ggactcaaga 


cgatagttac 


9120 


cggataaggc 


gcagcggtcg 


ggctgaacgg 


ggggttcgtg 


cacacagccc 


agcttggagc 


9180 


gaacgaccta 


caccgaactg 


agatacctac 


agcgtgagct 


atgagaaagc 


gccacgcttc 


9240 


ccgaagggag 


aaaggcggac 


aggtatccgg 


taagcggcag 


ggtcggaaca 


ggagagcgca 


9300 


cgagggagct 


tccaggggga 


aacgcctggt 


atctttatag 


tcctgtcggg 


tttcgccacc 


9360 


tctgacttga 


gcgtcgattt 


ttgtgatgct 


cgtcaggggg 


gcggagccta 


tggaaaaacg 


9420 


ccagcaacgc 


ggccttttta 


cggttcctgg 


ccttttgctg 


gccttttgct 


cacatgttct 


9480 


ttcctgcgtt 


atcccctgat 


tctgtggata 


accgtattac 


cgcctttgag 


tgagctgata 


9540 
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ccgctcgccg 


cagccgaacg 


accgagcgca 


gcgagtcagt 


gagcgaggaa 


gcggaagagc 


9600 


gcctgatgcg 


gtattttctc 


cttacgcatc 


tgtgcggtat 


ttcacaccgc 


atatggtgca 


9660 


ctctcagtac 


aatctgctct 


gatgccgcat 


agttaagcca 


gtatacactc 


cgctatcgct 


9720 


acgtgactgg 


gtcatggctg 


cgccccgaca 


cccgccaaca 


cccgctgacg 


cgccctgacg 


9780 


ggcttgtctg 


ctcccggcat 


ccgcttacag 


acaagctgtg 


accgtctccg 


ggagctgcat 


9840 


gtgtcagagg 


ttttcaccgt 


catcaccgaa 


acgcgcgagg 


cagctgcggt 


aaagctcatc 


9900 


agcgtggtcg 


tgaagcgatt 


cacagatgtc 


tgcctgttca 


tccgcgtcca 


gctcgttgag 


9960 


tttctccaga 


agcgttaatg 


tctggcttct 


gataaagcgg 


gccatgttaa 


gggcggtttt 


10020 


ttcctgtttg 


gtcactgatg 


cctccgtgta 


agggggattt 


ctgttcatgg 


gggtaatgat 


10080 


accgatgaaa 


cgagagagga 


tgctcacgat 


acgggttact 


gatgatgaac 


atgcccggtt 


10140 


actggaacgt 


tgtgagggta 


aacaactggc 


ggtatggatg 


cggcgggacc 


agagaaaaat 


10200 


cactcagggt 


caatgccagc 


gcttcgttaa 


tacagatgta 


ggtgttccac 


agggtagcca 


10260 


gcagcatcct 


gcgatgcaga 


tccggaacat 


aatggtgcag 


ggcgctgact 


tccgcgtttc 


10320 


cagactttac 


gaaacacgga 


aaccgaagac 


cattcatgtt 


gttgctcagg 


tcgcagacgt 


10380 


tttgcagcag 


cagtcgcttc 


acgttcgctc 


gcgtatcggt 


gattcattct 


gctaaccagt 


10440 


aaggcaaccc 


cgccagccta 


gccgggtcct 


caacgacagg 


agcacgatca 


tgcgcacccg 


10500 


tggccaggac 


ccaacgctgc 


ccgagatgcg 


ccgcgtgcgg 


ctgctggaga 


tggcggacgc 


10560 


gatggatatg 


ttctgccaag 


ggttggtttg 


cgcattcaca 


gttctccgca 


agaattgatt 


10620 


ggctccaatt 


cttggagtgg 


tgaatccgtt 


agcgaggtgc 


cgccggcttc 


cattcaggtc 


10680 


gaggtggccc 


ggctccatgc 


accgcgacgc 


aacgcgggga 


ggcagacaag 


gtatagggcg 


10740 


gcgcctacaa 


tccatgccaa 


cccgttccat 


gtgctcgccg 


aggcggcata 


aatcgccgtg 


10800 


acgatcagcg 


gtccaatgat 


cgaagttagg 


ctggtaagag 


ccgcgagcga 


tccttgaagc 


10860 


tgtccctgat 


ggtcgtcatc 


tacctgcctg 


gacagcatgg 


cctgcaacgc 


gggcatcccg 


10920 


atgccgccgg 


aagcgagaag 


aatcataatg 


gggaaggcca 


tccagcctcg 


cgtcgcgaac 


10980 


gccagcaaga 


cgtagcccag 


cgcgtcggcc 


gccatgccgg 


cgataatggc 


ctgcttctcg 


11040 


ccgaaacgtt 


tggtggcggg 


accagtgacg 


aaggcttgag 


cgagggcgtg 


caagattccg 


11100 


aataccgcaa 


gcgacaggcc 


gatcatcgtc 


gcgctccagc 


gaaagcggtc 


ctcgccgaaa 


11160 


atgacccaga 


gcgctgccgg 


cacctgtcct 


acgagttgca 


tgataaagaa 


gacagtcata 


11220 


agtgcggcga 


cgatagtcat 


gccccgcgcc 


caccggaagg 


agctgactgg 


gttgaaggct 


11280 


ctcaagggca 


tcggtcgagg 


atccttcaat 


atgcgcacat 


acgctgttat 


gttcaaggtc 


11340 
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ccttcgttta 


agaacgaaag 


cggtcttcct 


tttgagggat 


gtttcaagtt 


gttcaaatct 


11400 


atcaaatttg 


caaatcccca 


gtctgtatct 


agagcgttga 


atcggtgatg 


cgatttgtta 


11460 


attaaattga 


tggtgtcacc 


attaccaggt 


ctagatatac 


caatggcaaa 


ctgagcacaa 


11520 


caataccagt 


ccggatcaac 


tggcaccatc 


tctcccgtag 


tctcatctaa 


tttttcttcc 


11580 


ggatgaggtt 


ccagatatac 


cgcaacacct 


ttattatggt 


ttccctgagg 


gaataataga 


11640 


atgtcccatt 


cgaaatcacc 


aattctaaac 


ctgggcgaat 


tgtatttegg 


gtttgttaac 


11700 


tcgttccagt 


caggaatgtt 


ccacgtgaag 


ctatcttcca 


gcaaagtctc 


cacttcttca 


11760 


tcaaattgtg 


gagaatactc 


ccaatgctct 


tatctatggg 


acttceggga 


aacacagtac 


11820 


cgatacttcc 


caattcgtct 


tcagagctca 


ttgtttgttt 


gaagagacta 


atcaaagaat 


11880 


cgttttctca 


aaaaaattaa 


tatcttaact 


gatagtttga 


tcaaaggggc 


aaaaegtagg 


11940 


ggcaaacaaa 


cggaaaaatc 


gtttctcaaa 


ttttctgatg 


ccaagaactc 


taaccagtct 


12000 


tatctaaaaa 


ttgccttatg 


atccgtctct 


ccggttacag 


cctgtgtaac 


tgattaatcc 


12060 


tgcctttcta 


atcaccattc 


taatgtttta 


attaagggat 


tttgtcttca 


ttaaeggett 


12120 


tcgctcataa 


aaatgttatg 


aegttttgee 


cgcaggcggg 


aaaccatcca 


cttcacgaga 


12180 


ctgatctcct 


ctgccggaac 


accgggcatc 


tccaacttat 


aagttggaga 


aataagagaa 


12240 


tttcagattg 


agagaatgaa 


aaaaaaaaac 


ccttagttca 


taggtccatt 


ctcttagcgc 


12300 


aactacagag 


aacaggggca 


caaacaggca 


aaaaaeggge 


acaacctcaa 


tggagtgatg 


12360 


caacctgcct 


ggagtaaatg 


atgacacaag 


gcaattgacc 


cacgcatgta 


tctatctcat 


12420 


tttcttacac 


cttctattac 


cttctgctct 


ctctgatttg 


gaaaaagctg 


aaaaaaaagg 


12480 


ttgaaaccag 


ttccctgaaa 


ttattcccct 


acttgactaa 


taagtatata 


aagaeggtag 


12540 


gtattgattg 


taattctgta 


aatctatttc 


ttaaacttct 


taaattctac 


ttttatagtt 


12600 


agtctttttt 


ttagttttaa 


aacaccaaga 


acttagtttc 


gaataaacac 


acataaacaa 


12660 


acaagcttac 


aaaacaaa atg get gca tat gca get 
Met Ala Ala Tyr Ala Ala 


c ^9 99 c tat aag gtg 
Gin Gly Tyr Lys Val 


12711 



15 10 

eta gta etc aac ccc tct gtt get gca aca ctg ggc ttt ggt get tac 12759 

Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr 

15 20 25 

atg tec aag get cat ggg ate gat cct aac ate agg acc ggg gtg aga 12807 

Met Ser Lys Ala His Gly He Asp Pro Asn He Arg Thr Gly Val Arg 

30 35 40 

aca att acc act ggc age ccc ate acg tac tec acc tac ggc aag ttc 12 855 

Thr He Thr Thr Gly Ser Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe 

4 5 50 55 
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ctt gcc gac ggc ggg tgc teg ggg ggc get tat gac ata ata att tgt 12903 
Leu Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp lie lie lie Cys 
60 65 70 75 

gac gag tgc cac tec acg gat gcc aca tec ate ttg ggc att ggc act 12951 
Asp Glu Cys His Ser Thr Asp Ala Thr Ser lie Leu Gly lie Gly Thr 
80 85 90 

gtc ctt gac caa gca gag act gcg ggg gcg aga ctg gtt gtg etc gcc 12999 
Val Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala 
95 100 105 

ace gcc ace cct ccg ggc tec gtc act gtg ccc cat ccc aac ate gag 13047 
Thr Ala Thr Pro Pro Gly Ser Val Thr Val Pro His Pro Asn lie Glu 
110 115 120 

gag gtt get ctg tec ace ace gga gag ate cct ttt tac ggc aag get 13095 
Glu Val Ala Leu Ser Thr Thr Gly Glu lie Pro Phe Tyr Gly Lys Ala 
125 130 135 

ate ccc etc gaa gta ate aag ggg ggg aga cat etc ate ttc tgt cat 13143 
lie Pro Leu Glu Val lie Lys Gly Gly Arg His Leu lie Phe Cys His 
140 145 150 155 

tea aag aag aag tgc gac gaa etc gcc gca aag ctg gtc gca ttg ggc 13191 
Ser Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly 
160 165 170 

ate aat gcc gtg gcc tac tac cgc ggt ctt gac gtg tec gtc ate ccg 13239 
lie Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val lie Pro 
175 180 185 

acc age ggc gat gtt gtc gtc gtg gca ace gat gcc etc atg ace ggc 13287 
Thr Ser Gly Asp Val Val Val Val Ala Thr Asp Ala Leu Met Thr Gly 
190 195 200 

tat acc ggc gac ttc gac teg gtg ata gac tgc aat acg tgt gtc acc 13 335 
Tyr Thr Gly Asp Phe Asp Ser Val lie Asp Cys Asn Thr Cys Val Thr 
205 210 215 

cag aca gtc gat ttc age ctt gac cct acc ttc acc att gag aca ate 133 83 
Gin Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr lie Glu Thr He 
220 225 230 235 

acg etc ccc caa gat get gtc tec cgc act caa cgt egg ggc agg act 13431 
Thr Leu Pro Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr 
240 245 250 

ggc agg ggg aag cca ggc ate tac aga ttt gtg gca ccg ggg gag cgc 13479 
Gly Arg Gly Lys Pro Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg 
255 260 265 

ccc tec ggc atg ttc gac teg tec gtc etc tgt gag tgc tat gac gca 13527 
Pro Ser Gly Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala 
270 275 280 

ggc tgt get tgg tat gag etc acg ccc gcc gag act aca gtt agg eta 13575 
Gly Cys Ala Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu 
285 290 295 
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cga gcg tac atg aac acc ccg ggg ctt ccc gtg tgc cag gac cat ctt 13623 
Arg Ala Tyr Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu 
300 305 310 315 

gaa ttt tgg gag ggc gtc ttt aca ggc etc act cat ata gat gec cac 13671 
Glu Phe Trp Glu Gly Val Phe Thr Gly Leu Thr His lie Asp Ala His 
320 325 330 

ttt eta tec cag aca aag cag agt ggg gag aac ctt cct tac ctg gta 13719 
Phe Leu Ser Gin Thr Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val 
335 340 345 

gcg tac caa gec acc gtg tgc get agg get caa gec cct ccc cca teg 13767 
Ala Tyr Gin Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser 
350 355 360 

tgg gac cag atg tgg aag tgt ttg att cgc etc aag ccc acc etc cat 13815 
Trp Asp Gin Met Trp Lys Cys Leu lie Arg Leu Lys Pro Thr Leu His 
365 370 375 

ggg cca aca ccc ctg eta tac aga ctg ggc get gtt cag aat gaa ate 13863 
Gly Pro Thr Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu lie 
380 385 390 395 

acc ctg acg cac cca gtc acc aaa tac ate atg aca tgc atg teg gee 13911 
Thr Leu Thr His Pro Val Thr Lys Tyr lie Met Thr Cys Met Ser Ala 
400 405 410 

gac ctg gag gtc gtc acg age acc tgg gtg etc gtt ggc ggc gtc ctg 13959 
Asp Leu Glu Val Val Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu 
415 420 425 

get get ttg gee gcg tat tgc ctg tea aca ggc tgc gtg gtc ata gtg 14007 
Ala Ala Leu Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val He Val 
430 435 440 

ggc agg gtc gtc ttg tec ggg aag ccg gca ate ata cct gac agg gaa 14 055 
Gly Arg Val Val Leu Ser Gly Lys Pro Ala He He Pro Asp Arg Glu 
445 450 455 

gtc etc tac cga gag ttc gat gag atg gaa gag tgc tct cag cac tta 14103 
Val Leu Tyr Arg Glu Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu 
460 465 470 475 

ccg tac ate gag caa ggg atg atg etc gee gag cag ttc aag cag aag 14151 
Pro Tyr He Glu Gin Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys 
480 485 490 

gee etc ggc etc ctg cag acc gcg tec cgt cag gca gag gtt ate gec 14199 
Ala Leu Gly Leu Leu Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala 
495 500 505 

cct get gtc cag acc aac tgg caa aaa etc gag acc ttc tgg gcg aag 14247 
Pro Ala Val Gin Thr Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys 
510 515 520 

cat atg tgg aac ttc ate agt ggg ata caa tac ttg gcg ggc ttg tea 14295 
His Met Trp Asn Phe He Ser Gly He Gin Tyr Leu Ala Gly Leu Ser 
525 530 * 535 
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acg ctg cct ggt aac ccc gcc att get tea ttg atg get ttt aca get 14343 

Thr Leu Pro Gly Asn Pro Ala lie Ala Ser Leu Met Ala Phe Thr Ala 
540 545 550 555 

get gtc ace age cca eta ace act age caa acc etc etc ttc aac ata 14391 

Ala Val Thr Ser Pro Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn lie 
560 565 570 

ttg ggg ggg tgg gtg get gcc cag etc gcc gcc ccc ggt gcc get act 144 39 

Leu Gly Gly Trp Val Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr 
575 580 585 

gcc ttt gtg ggc get ggc tta get ggc gcc gcc ate ggc agt gtt gga 144 87 

Ala Phe Val Gly Ala Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly 
590 595 600 

ctg ggg aag gtc etc ata gac ate ctt gca ggg tat ggc gcg ggc gtg 14535 

Leu Gly Lys Val Leu He Asp He Leu Ala Gly Tyr Gly Ala Gly Val 

605 610 615 

gcg gga get ctt gtg gca ttc aag ate atg age ggt gag gtc ccc tec 145 83 

Ala Gly Ala Leu Val Ala Phe Lys He Met Ser Gly Glu Val Pro Ser 
620 625 630 635 

acg gag gac ctg gtc aat eta ctg ccc gcc ate etc teg ccc gga gcc 14631 

Thr Glu Asp Leu Val Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala 
640 645 650 

etc gta gtc ggc gtg gtc tgt gca gca ata ctg cgc egg cac gtt ggc 14679 

Leu Val Val Gly Val Val Cys Ala Ala He Leu Arg Arg His Val Gly 
655 660 665 

ccg ggc gag ggg gca gtg cag tgg atg aac egg ctg ata gcc ttc gcc 14727 

Pro Gly Glu Gly Ala Val Gin Trp Met Asn Arg Leu He Ala Phe Ala 
670 675 680 

tec egg ggg aac cat gtt tec ccc acg cac tac gtg ccg gag age gat 14775 

Ser Arg Gly Asn His Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp 

685 690 695 

gca get gcc cgc gtc act gcc ata etc age age etc act gta acc cag 14823 

Ala Ala Ala Arg Val Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin 
700 705 710 715 

etc ctg agg cga ctg cac cag tgg ata age teg gag tgt acc act cca 14871 
Leu Leu Arg Arg Leu His Gin Trp He Ser Ser Glu Cys Thr Thr Pro 
720 725 730 

tgc tec ggt tec tgg eta agg gac ate tgg gac tgg ata tgc gag gtg 14919 
Cys Ser Gly Ser Trp Leu Arg Asp He Trp Asp Trp He Cys Glu Val 
735 740 745 

ttg age gac ttt aag acc tgg eta aaa get aag etc atg cca cag ctg 14967 

Leu Ser Asp Phe Lys Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu 
750 755 760 

cct ggg ate ccc ttt gtg tec tgc cag cgc ggg tat aag ggg gtc tgg 15015 
Pro Gly He Pro Phe Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp 

765 770 * 775 
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cga ggg gac ggc ate atg cac act cgc tgc cac tgt gga get gag ate 15063 
Arg Gly Asp Gly lie Met His Thr Arg Cys His Cys Gly Ala Glu lie 
780 785 790 795 

act gga cat gtc aaa aac ggg acg atg agg ate gtc ggt cct agg ace 15111 
Thr Gly His Val Lys Asn Gly Thr Met Arg lie Val Gly Pro Arg Thr 
800 805 810 

tgc agg aac atg tgg agt ggg ace ttc ccc att aat gee tac ace acg 15159 
Cys Arg Asn Met Trp Ser Gly Thr Phe Pro lie Asn Ala Tyr Thr Thr 
815 820 825 

ggc ccc tgt acc ccc ctt cct gcg ccg aac tac acg ttc gcg eta tgg 152 07 
Gly Pro Cys Thr Pro Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp 
830 835 840 

agg gtg tct gca gag gaa tac gtg gag ata agg cag gtg ggg gac ttc 15255 
Arg Val Ser Ala Glu Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe 
845 850 855 

cac tac gtg acg ggt atg act act gac aat ctt aaa tgc ccg tgc cag 15303 
His Tyr Val Thr Gly Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin 
860 865 870 875 

gtc cca teg ccc gaa ttt ttc aca gaa ttg gac ggg gtg cgc eta cat 15351 
Val Pro Ser Pro Glu Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His 
880 885 890 

agg ttt gcg ccc ccc tgc aag ccc ttg ctg egg gag gag gta tea ttc 15399 
Arg Phe Ala Pro Pro Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe 
895 900 905 

aga gta gga etc cac gaa tac ccg gta ggg teg caa tta cct tgc gag 15447 
Arg Val Gly Leu His Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu 
910 915 920 

ccc gaa ccg gac gtg gee gtg ttg acg tec atg etc act gat ccc tec 15495 
Pro Glu Pro Asp Val Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser 
925 930 935 

cat ata aca gca gag gcg gee ggg cga agg ttg gcg agg gga tea ccc 15543 
His He Thr Ala Glu Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro 
940 945 950 955 

ccc tct gtg gee age tec teg get age cag eta tec get cca tct etc 15591 
Pro Ser Val Ala Ser Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu 
960 965 970 

aag gca act tgc acc get aac cat gac tec cct gat get gag etc ata 15639 
Lys Ala Thr Cys Thr Ala Asn His Asp Ser Pro Asp Ala Glu Leu He 
975 980 985 

gag gec aac etc eta tgg agg cag gag atg ggc ggc aac ate acc agg 15687 
Glu Ala Asn Leu Leu Trp Arg Gin Glu Met Gly Gly Asn He Thr Arg 
990 995 1000 

gtt gag tea gaa aac aaa gtg gtg att ctg gac tec ttc gat ccg ctt 15735 
Val Glu Ser Glu Asn Lys Val Val He Leu Asp Ser Phe Asp Pro Leu 
1005 1010 1015 
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gtg gcg gag gag gac gag egg gag ate tec gta ccc gca gaa ate ctg 15783 
Val Ala Glu Glu Asp Glu Arg Glu lie Ser Val Pro Ala Glu He Leu 
1020 1025 1030 1035 

egg aag tct egg aga ttc gee cag gee ctg ccc gtt tgg gcg egg ccg 15831 
Arg Lys Ser Arg Arg Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro 
1040 1045 1050 

gac tat aac ccc ccg eta gtg gag acg tgg aaa aag ccc gac tac gaa 1587 9 
Asp Tyr Asn Pro Pro Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu 
1055 1060 1065 

cca cct gtg gtc cat ggc tgc ccg ctt cca cct cca aag tec cct cct 15927 
Pro Pro Val Val His Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro 
1070 1075 1080 

gtg cct ccg cct egg aag aag egg acg gtg gtc etc act gaa tea acc 15975 
Val Pro Pro Pro Arg Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr 
1085 1090 1095 

eta tct act gee ttg gec gag etc gee acc aga age ttt ggc age tec 16023 
Leu Ser Thr Ala Leu Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser 
1100 1105 1110 1115 

tea act tec ggc att acg ggc gac aat acg aca aca tec tct gag ccc 16071 
Ser Thr Ser Gly He Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro 
1120 1125 1130 

gee cct tct ggc tgc ccc ccc gac tec gac get gag tec tat tec tec 16119 
Ala Pro Ser Gly Cys Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser 
1135 1140 1145 

atg ccc ccc ctg gag ggg gag cct ggg gat ccg gat ctt age gac ggg 16167 
Met Pro Pro Leu Glu Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly 
1150 1155 1160 

tea tgg tea acg gtc agt agt gag gee aac gcg gag gat gtc gtg tgc 16215 
Ser Trp Ser Thr Val Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys 
1165 1170 1175 

tgc tea atg tct tac tct tgg aca ggc gca etc gtc acc ccg tgc gee 16263 
Cys Ser Met Ser Tyr Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala 
1180 1185 1190 1195 

gcg gaa gaa cag aaa ctg ccc ate aat gca eta age aac teg ttg eta 16311 
Ala Glu Glu Gin Lys Leu Pro He Asn Ala Leu Ser Asn Ser Leu Leu 
1200 1205 1210 

cgt cac cac aat ttg gtg tat tec acc acc tea cgc agt get tgc caa 16359 
Arg His His Asn Leu Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin 
1215 1220 1225 

agg cag aag aaa gtc aca ttt gac aga ctg caa gtt ctg gac age cat 16407 
Arg Gin Lys Lys Val Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His 
1230 1235 1240 

tac cag gac gta etc aag gag gtt aaa gca gcg gcg tea aaa gtg aag 16455 
Tyr Gin Asp Val Leu Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys 
1245 1250 1255 
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get aac ttg eta tec gta gag gaa get tgc age ctg acg ccc cca cac 16503 
Ala Asn Leu Leu Ser Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His 
1260 1265 1270 1275 

tea gee aaa tec aag ttt ggt tat ggg gca aaa gac gtc cgt tgc cat 16551 
Ser Ala Lys Ser Lys Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His 
1280 1285 1290 

gee aga aag gee gta acc cac ate aac tec gtg tgg aaa gac ctt ctg 16599 
Ala Arg Lys Ala Val Thr His He Asn Ser Val Trp Lys Asp Leu Leu 
1295 1300 1305 

gaa gac aat gta aca cca ata gac act acc ate atg get aag aac gag 16647 
Glu Asp Asn Val Thr Pro He Asp Thr Thr He Met Ala Lys Asn Glu 
1310 1315 1320 

gtt ttc tgc gtt cag cct gag aag ggg ggt cgt aag cca get cgt etc 16695 
Val Phe Cys Val Gin Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu 
1325 1330 1335 

ate gtg ttc ccc gat ctg ggc gtg cgc gtg tgc gaa aag atg get ttg 16743 
lie Val Phe Pro Asp Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu 
1340 1345 1350 1355 

tac gac gtg gtt aca aag etc ccc ttg gee gtg atg gga age tec tac 16791 
Tyr Asp Val Val Thr Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr 
1360 1365 1370 

gga ttc caa tac tea cca gga cag egg gtt gaa ttc etc gtg caa gcg 16839 
Gly Phe Gin Tyr Ser Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala 
1375 1380 1385 

tgg aag tec aag aaa acc cca atg ggg ttc teg tat gat acc cgc tgc 16887 
Trp Lys Ser Lys Lys Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys 
1390 1395 1400 

ttt gac tec aca gtc act gag age gac ate cgt acg gag gag gca ate 16935 
Phe Asp Ser Thr Val Thr Glu Ser Asp He Arg Thr Glu Glu Ala lie 
1405 1410 1415 

tac caa tgt tgt gac etc gac ccc caa gee cgc gtg gee ate aag tec 16983 
Tyr Gin Cys Cys Asp Leu Asp Pro Gin Ala Arg Val Ala He Lys Ser 
1420 1425 1430 1435 

etc acc gag agg ctt tat gtt ggg ggc cct ctt acc aat tea agg ggg 17031 
Leu Thr Glu Arg Leu Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly 
1440 1445 1450 

gag aac tgc ggc tat cgc agg tgc cgc gcg age ggc gta ctg aca act 17079 
Glu Asn Cys Gly Tyr Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr 
1455 1460 1465 

age tgt ggt aac acc etc act tgc tac ate aag gee egg gca gee tgt 17127 
Ser Cys Gly Asn Thr Leu Thr Cys Tyr He Lys Ala Arg Ala Ala Cys 
1470 1475 1480 

cga gee gca ggg etc cag gac tgc acc atg etc gtg tgt ggc gac gac 17175 
Arg Ala Ala Gly Leu Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp 
1485 1490 1495 
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tta gtc gtt ate tgt gaa age gcg ggg gtc cag gag gac gcg gcg age 17223 
Leu Val Val lie Cys Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser 
1500 1505 1510 1515 

ctg aga gee ttc acg gag get atg ace agg tac tec gec ccc cct ggg 17271 
Leu Arg Ala Phe Thr Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly 
1520 1525 1530 

gac ccc cca caa cca gaa tac gac ttg gag etc ata aca tea tgc tec 17319 
Asp -Pro Pro Gin Pro Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser 
1535 1540 1545 

tec aac gtg tea gtc gee cac gac ggc get gga aag agg gtc tac tac 17367 
Ser Asn Val Ser Val Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr 
1550 1555 1560 

etc ace cgt gac cct aca ace ccc etc gcg aga get gcg tgg gag aca 17415 
Leu Thr Arg Asp Pro Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr 
1565 1570 1575 

gca aga cac act cca gtc aat tec tgg eta ggc aac ata ate atg ttt 17463 
Ala Arg His Thr Pro Val Asn Ser Trp Leu Gly Asn lie lie Met Phe 
1580 1585 1590 1595 

gee ccc aca ctg tgg gcg agg atg ata ctg atg ace cat ttc ttt age 17511 
Ala Pro Thr Leu Trp Ala Arg Met He Leu Met Thr His Phe Phe Ser 
1600 1605 1610 

gtc ctt ata gee agg gac cag ctt gaa cag gee etc gat tgc gag ate 17559 
Val Leu He Ala Arg Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu He 
1615 1620 1625 

tac ggg gee tgc tac tec ata gaa cca ctg gat eta cct cca ate att 17607 
Tyr Gly Ala Cys Tyr Ser lie Glu Pro Leu Asp Leu Pro Pro He He 
1630 1635 1640 

caa aga etc cat ggc etc age gca ttt tea etc cac agt tac tct cca 17655 
Gin Arg Leu His Gly Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro 
1645 1650 1655 

ggt gaa ate aat agg gtg gec gca tgc etc aga aaa ctt ggg gta ccg 17703 
Gly Glu He Asn Arg Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro 
1660 1665 1670 1675 

ccc ttg cga get tgg aga cac egg gec egg age gtc cgc get agg ctt 17751 
Pro Leu Arg Ala Trp Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu 
1680 1685 1690 

ctg gee aga gga ggc agg get gec ata tgt ggc aag tac etc ttc aac 17799 
Leu Ala Arg Gly Gly Arg Ala Ala He Cys Gly Lys Tyr Leu Phe Asn 
1695 1700 1705 

tgg gca gta aga aca aag etc aaa etc act cca ata gcg gec get ggc 17847 
Trp Ala Val Arg Thr Lys Leu Lys Leu Thr Pro He Ala Ala Ala Gly 
1710 1715 1720 

cag ctg gac ttg tec ggc tgg ttc acg get ggc tac age ggg gga gac 17895 
Gin Leu Asp Leu Ser Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp 
1725 1730 1735 
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att tat cac age gtg tct cat gec egg ccc cgc tgg ate tgg ttt tgc 17943 
lie Tyr His Ser Val Ser His Ala Arg Pro Arg Trp He Trp Phe Cys 
1740 1745 1750 1755 

eta etc ctg ctt get gca ggg gta ggc ate tac etc etc ccc aac cga 17991 
Leu Leu Leu Leu Ala Ala Gly Val Gly He Tyr Leu Leu Pro Asn Arg 
1760 1765 1770 

atg age acg aat cct aaa cct caa aga aag ace aaa cgt aac ace aac 1803 9 
Met Ser Thr Asn Pro Lys Pro Gin Arg Lys Thr Lys Arg Asn Thr Asn 
1775 1780 1785 

egg egg ccg cag gac gtc aag ttc ccg ggt ggc ggt cag ate gtt ggt 18087 
Arg Arg Pro Gin Asp Val Lys Phe Pro Gly Gly Gly Gin He Val Gly 
1790 1795 1800 

gga gtt tac ttg ttg ccg cgc agg ggc cct aga ttg ggt gtg cgc gcg 18135 
Gly Val Tyr Leu Leu Pro Arg Arg Gly Pro Arg Leu Gly Val Arg Ala 
1805 1810 1815 

acg aga aag act tec gag egg teg caa cct cga ggt aga cgt cag cct 18183 
Thr Arg Lys Thr Ser Glu Arg Ser Gin Pro Arg Gly Arg Arg Gin Pro 
1820 1825 1830 " 1835 

ate ccc aag get cgt egg ccc gag ggc agg ace tgg get cag ccc ggg 18231 
He Pro Lys Ala Arg Arg Pro Glu Gly Arg Thr Trp Ala Gin Pro Gly 
1840 1845 1850 

tac cct tgg ccc etc tat ggc aat gag ggc tgc ggg tgg gcg gga tgg 18279 
Tyr Pro Trp Pro Leu Tyr Gly Asn Glu Gly Cys Gly Trp Ala Gly Trp 
1855 1860 1865 

etc ctg tct ccc cgt ggc tct egg cct age tgg ggc ccc aca gac ccc 18327 
Leu Leu Ser Pro Arg Gly Ser Arg Pro Ser Trp Gly Pro Thr Asp Pro 
1870 1875 1880 

egg cgt agg teg cgc aat ttg ggt aag gtc ate gat acc ctt acg tgc 18375 
Arg Arg Arg Ser Arg Asn Leu Gly Lys Val He Asp Thr Leu Thr Cys 
1885 1890 1895 

ggc ttc gee gac etc atg ggg tac ata ccg etc gtc taatagtcga 18421 
Gly Phe Ala Asp Leu Met Gly Tyr lie Pro Leu Val I.- • 



1900 


1905 


1910 








ctttgttccc 


actgtacttt 


tagctegtae 


aaaatacaat 


atacttttca 


tttctccgta 


18481 


aacaacatgt 


tttcccatgt 


aatatccttt 


tctatttttc 


gttccgttac 


caactttaca 


18541 


catactttat 


atagctattc 


acttctatac 


actaaaaaac 


taagacaatt 


ttaattttgc 


18601 


tgcctgccat 


atttcaattt 


gttataaatt 


cctataattt 


atcctattag 


tagctaaaaa 


18661 


aagatgaatg 


tgaatcgaat 


cctaagagaa 


ttggatctga 


tccacaggac 


gggtgtggtc 


18721 


gecatgateg 


cgtagtcgat 


agtggctcca 


agtagcgaag 


cgagcaggac 


tgggeggegg 


18781 


ecaaageggt 


cggacagtgc 


tccgagaacg 


ggtgcgcata 


gaaattgeat 


caaegcatat 


18841 


agegctagea 


gcacgccata 


gtgactggcg 


atgctgtcgg 


aatggacgat 


atcccgcaag 


18901 
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aggcccggca 


gtaccggcat 


aaccaagcct 


atgcctacag 


catccagggt 


gacggtgccg 


18961 


aggatgacga 


tgagcgcatt 


gttagatttc 


atacacggtg 


cctgactgcg 


ttagcaattt 


19021 


aactgtgata 


aactaccgca 


ttaaagcttt 


ttctttccaa 


tttttttttt 


ttcgtcatta 


19081 


taaaaatcat 


tacgaccgag 


attcccgggt 


aataactgat 


ataattaaat 


tgaagctcta 


19141 


atttgtgagt 


ttagtataca 


tgcatttact 


tataatacag 


ttttttagtt 


ttgctggccg 


19201 


catcttctca 


aatatgcttc 


ccagcctgct 


tttctgtaac 


gttcaccctc 


taccttagca 


19261 


tcccttccct 


ttgcaaatag 


tcctcttcca 


acaataataa 


tgtcagatcc 


tgtagagacc 


19321 


acatcatcca 


cggttctata 


ctgttgaccc 


aatgcgtctc 


ccttgtcatc 


taaacccaca 


19381 


ccgggtgtca 


taatcaacca 


atcgtaacct 


tcatctcttc 


cacccatgtc 


tctttgagca 


19441 


ataaagccga 


taacaaaatc 


tttgtcgctc 


ttcgcaatgt 


caacagtacc 


cttagtatat 


19501 


tctccagtag 


atagggagcc 


cttgcatgac 


aattctgcta 


acatcaaaag 


gcctctaggt 


19561 


tcctttgtta 


cttcttctgc 


cgcctgcttc 


aaaccgctaa 


caatacctgg 


gcccaccaca 


19621 


ccgtgtgcat 


tcgtaatgtc 


tgcccattct 


gctattctgt 


atacacccgc 


agagtactgc 


19681 


aatttgactg 


tattaccaat 


gtcagcaaat 


tttctgtctt 


cgaagagtaa 


aaaattgtac 


19741 


ttggcggata 


atgcctttag 


cggcttaact 


gtgccctcca 


tggaaaaatc 


agtcaagata 


19801 


tccacatgtg 


tttttagtaa 


acaaattttg 


ggacctaatg 


cttcaactaa 


ctccagtaat 


19861 


tccttggtgg 


tacgaacatc 


caatgaagca 


cacaagtttg 


tttgcttttc 


gtgcatgata 


19921 


ttaaatagct 


tggcagcaac 


aggactagga 


tgagtagcag 


cacgttcctt 


atatgtagct 


19981 


ttcgacatga 


tttatcttcg 


tttcctgcag 


gtttttgttc 


tgtgcagttg 


ggttaagaat 


20041 


actgggcaat 


ttcatgtttc 


ttcaacacta 


catatgcgta 


tatataccaa 


tctaagtctg 


20101 


tgctccttcc 


ttcgttcttc 


cttctgttcg 


gagattaccg 


aatcaaaaaa 


atttcaagga 


20161 


aaccgaaatc 


aaaaaaaaga 


ataaaaaaaa 


aatgatgaat 


tgaaaagctt 


atcgat 


20217 



<210> 17 
<211> 1911 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd . del ta . NS3NS5 . p j . corel4 0 

<400> 17 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu Asn Pro 
15 10 15 
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Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys Ala His 
20 25 30 

Gly lie Asp Pro Asn lie Arg Thr Gly Val Arg Thr lie Thr Thr Gly 
35 40 45 

Ser Pro lie Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp Gly Gly 
50 55 60 

Cys Ser Gly Gly Ala Tyr Asp lie lie lie Cys Asp Glu Cys His Ser 
65 70 75 80 

Thr Asp Ala Thr Ser He Leu Gly He Gly Thr Val Leu Asp Gin Ala 
85 90 ' 95 

Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr Pro Pro 
100 105 110 

Gly Ser Val Thr Val Pro His Pro Asn He Glu Glu Val Ala Leu Ser 
115 120 125 

Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala He Pro Leu Glu Val 
130 135 140 

lie Lys Gly Gly Arg His Leu He Phe Cys His Ser Lys Lys Lys Cys 
145 150 155 160 

Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly He Asn Ala Val Ala 
165 170 175 

Tyr Tyr Arg Gly Leu Asp Val Ser Val lie Pro Thr Ser Gly Asp Val 
180 185 190 

Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe 
195 200 205 

Asp Ser Val lie Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe 
210 215 220 

Ser Leu Asp Pro Thr Phe Thr He Glu Thr lie Thr Leu Pro Gin Asp 
225 230 235 240 

Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly Lys Pro 
245 250 255 

Gly lie Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly Met Phe 
260 265 270 

Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala Trp Tyr 
275 280 285 

Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr Met Asn 
290 295 300 

Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp Glu Gly 
305 310 315 320 



Val Phe Thr Gly Leu Thr His lie Asp Ala His Phe Leu Ser Gin Thr 
325 330 335 
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Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin Ala Thr 
340 345 350 

Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin Met Trp 
355 360 365 

Lys Cys Leu lie Arg Leu Lys Pro Thr Leu His Gly Pro Thr Pro Leu 
370 375 380 

Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu lie Thr Leu Thr His Pro 
385 390 395 400 

Val Thr Lys Tyr He Met Thr Cys Met Ser Ala Asp Leu Glu Val Val 
405 410 415 

Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu Ala Ala 
420 425 430 

Tyr Cys Leu Ser Thr Gly Cys Val Val He Val Gly Arg Val Val Leu 
435 440 445 

Ser Gly Lys Pro Ala He He Pro Asp Arg Glu Val Leu Tyr Arg Glu 
450 455 460 

Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr He Glu Gin 
465 470 475 480 

Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly Leu Leu 
485 490 495 

Gin Thr Ala Ser Arg Gin Ala Glu Val lie Ala Pro Ala Val Gin Thr 
500 505 510 

Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp Asn Phe 
515 520 ' 525 

He Ser Gly lie Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro Gly Asn 
530 535 540 

Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr Ser Pro 
545 550 555 560 

Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He Leu Gly Gly Trp Val 
565 570 575 

Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val Gly Ala 
580 585 590 

Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly Leu Gly Lys Val Leu 
595 600 605 

He Asp He Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala Leu Val 
610 615 620 

Ala Phe Lys He Met Ser Gly Glu Val Pro Ser Thr Glu Asp Leu Val 
625 630 635 640 

Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala Leu Val Val Gly Val 
645 650 655 
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Val Cys Ala Ala He Leu Arg Arg His Val Gly Pro Gly Glu Gly Ala 
660 665 670 

Val Gin Trp Met Asn Arg Leu He Ala Phe Ala Ser Arg Gly Asn His 
675 680 685 

Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala Arg Val 
690 695 700 

Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg Arg Leu 
705 710 715 720 

His Gin Trp He Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly Ser Trp 
725 730 735 

Leu Arg Asp He Trp Asp Trp He Cys Glu Val Leu Ser Asp Phe Lys 
740 745 750 

Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly He Pro Phe 
755 760 765 

Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp Gly He 
770 775 ~ 780 

Met His Thr Arg Cys His Cys Gly Ala Glu He Thr Gly His Val Lys 
785 790 795 800 

Asn Gly Thr Met Arg He Val Gly Pro Arg Thr Cys Arg Asn Met Trp 
805 810 815 

Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr Gly Pro Cys Thr Pro 
820 825 830 

Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser Ala Glu 
835 840 845 

Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe His Tyr Val Thr Gly 
850 855 860 

Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser Pro Glu 
865 870 875 880 

Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala Pro Pro 
885 890 895 

Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly Leu His 
900 905 910 

Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro Asp Val 
915 920 925 

Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His He Thr Ala Glu 
930 935 940 

Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val Ala Ser 
945 950 955 960 

Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr Cys Thr 
965 970 975 
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Ala Asn His Asp Ser Pro Asp Ala Glu Leu lie Glu Ala Asn Leu Leu 
980 985 990 

Trp Arg Gin Glu Met Gly Gly Asn lie Thr Arg Val Glu Ser Glu Asn 
995 1000 1005 

Lys Val Val lie Leu Asp Ser Phe Asp Pro Leu Val Ala Glu Glu Asp 
1010 1015 1020 

Glu Arg Glu He Ser Val Pro Ala Glu He Leu Arg Lys Ser Arg Arg 
025 1030 1035 1040 

Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn Pro Pro 
1045 1050 1055 

Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val Val His 
1060 1065 1070 

Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro Pro Arg 
1075 1080 1085 

Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr Ala Leu 
1090 1095 1100 

Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser Gly He 
105 1110 1115 1120 

Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser Gly Cys 
1125 1130 1135 

Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro Leu Glu 
1140 1145 1150 

Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser Thr Val 
1155 1160 1165 

Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met Ser Tyr 
1170 1175 1180 

Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu Gin Lys 
185 1190 1195 1200 

Leu Pro He Asn Ala Leu Ser Asn Ser Leu Leu Arg His His Asn Leu 
1205 1210 1215 

Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys Lys Val 
1220 1225 1230 

Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp Val Leu 
1235 1240 1245 

Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu Leu Ser 
1250 1255 1260 

Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys Ser Lys 
265 1270 1275 1280 

Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys Ala Val 
1285 1290 1295 
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Thr His lie Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn Val Thr 
1300 1305 1310 

Pro lie Asp Thr Thr He Met Ala Lys Asn Glu Val Phe Cys Val Gin 
1315 1320 1325 

Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu He Val Phe Pro Asp 
1330 1335 1340 

Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val Val Thr 
345 1350 1355 1360 

Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin Tyr Ser 
1365 1370 1375 

Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser Lys Lys 
1380 1385 1390 

Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val 
1395 1400 1405 

Thr Glu Ser Asp He Arg Thr Glu Glu Ala He Tyr Gin Cys Cys Asp 
1410 1415 1420 

Leu Asp Pro Gin Ala Arg Val Ala He Lys Ser Leu Thr Glu Arg Leu 
425 1430 1435 1440 

Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr 
1445 1450 1455 

Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
1460 1465 1470 

Leu Thr Cys Tyr He Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
1475 1480 1485 

Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val He Cys 
1490 1495 1500 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe Thr 
505 1510 1515 1520 

Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro Gin Pro 
1525 1530 1535 

Glu Tyr Asp Leu Glu Leu He Thr Ser Cys Ser Ser Asn Val Ser Val 
1540 1545 1550 

Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg Asp Pro 
1555 1560 1565 

Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His Thr Pro 
1570 1575 1580 

Val Asn Ser Trp Leu Gly Asn He He Met Phe Ala Pro Thr Leu Trp 
585 1590 1595 1600 

Ala Arg Met He Leu Met Thr His Phe Phe Ser Val Leu He Ala Arg 
1605 1610 1615 
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Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu lie Tyr Gly Ala Cys Tyr 
1620 1625 1630 

Ser lie Glu Pro Leu Asp Leu Pro Pro lie lie Gin Arg Leu His Gly 
1635 1640 1645 

Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu lie Asn Arg 
1650 1655 1660 

Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg Ala Trp 
665 1670 1675 1680 

Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg Gly Gly 
1685 1690 1695 

Arg Ala Ala lie Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val Arg Thr 
1700 1705 1710 

Lys Leu Lys Leu Thr Pro lie Ala Ala Ala Gly Gin Leu Asp Leu Ser 
1715 1720 1725 

Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp lie Tyr His Ser Val 
1730 1735 1740 

Ser His Ala Arg Pro Arg Trp lie Trp Phe Cys Leu Leu Leu Leu Ala 
745 1750 1755 1760 

Ala Gly Val Gly lie Tyr Leu Leu Pro Asn Arg Met Ser Thr Asn Pro 
1765 1770 1775 

Lys Pro Gin Arg Lys Thr Lys Arg Asn Thr Asn Arg Arg Pro Gin Asp 
1780 1785 1790 

Val Lys Phe Pro Gly Gly Gly Gin lie Val Gly Gly Val Tyr Leu Leu 
1795 1800 1805 

Pro Arg Arg Gly Pro Arg Leu Gly Val Arg Ala Thr Arg Lys Thr Ser 
1810 1815 1820 

Glu Arg Ser Gin Pro Arg Gly Arg Arg Gin Pro lie Pro Lys Ala Arg 
825 1830 1835 1840 

Arg Pro Glu Gly Arg Thr Trp Ala Gin Pro Gly Tyr Pro Trp Pro Leu 
1845 1850 1855 

Tyr Gly Asn Glu Gly Cys Gly Trp Ala Gly Trp Leu Leu Ser Pro Arg 
1860 1865 1870 

Gly Ser Arg Pro Ser Trp Gly Pro Thr Asp Pro Arg Arg Arg Ser Arg 
1875 1880 1885 

Asn Leu Gly Lys Val lie Asp Thr Leu Thr Cys Gly Phe Ala Asp Leu 
1890 1895 1900 

Met Gly Tyr lie Pro Leu Val 
905 1910 



<210> 18 
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<211> 20247 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd . delta . NS3NS5 .p j . corelSO 

<220> 
<221> CDS 

<222> (12679) . . (18441) 
<400> 18 



atcgatccta 


ccccttgcgc 


taaagaagta 


tatgtgccta 


ctaacgcttg 


tctttgtctc 


60 


tgtcactaaa 


cactggatta 


ttactcccag 


atacttattt 


tggactaatt 


taaatgattt 


120 


cggatcaacg 


ttcttaatat 


cgctgaatct 


tccacaattg 


atgaaagtag ctaggaagag 


180 


gaattggtat 


aaagtttttg 


tttttgtaaa 


tctcgaagta 


tactcaaacg 


aatttagtat 


240 


tttctcagtg 


atctcccaga 


tgctttcacc 


ctcacttaga 


agtgctttaa 


gcattttttt 


300 


actgtggcta 


tttcccttat 


ctgcttcttc 


cgatgattcg 


aactgtaatt 


gcaaactact 


360 


tacaatatca 


gtgatatcag 


attgatgttt 


ttgtccatag 


taaggaataa 


ttgtaaattc 


420 


ccaagcagga 


atcaatttct 


ttaatgaggc 


ttccagaatt 


gttgcttttt 


gcgtcttgta 


480 


tttaaactgg 


agtgatttat 


tgacaatatc 


gaaactcagc 


gaattgctta 


tgatagtatt 


540 


atagctcatg 


aatgtggctc 


tcttgattgc 


tgttccgtta 


tgtgtaatca 


tccaacataa 


600 


ataggttagt 


tcagcagcac 


ataatgctat 


tttctcacct 


gaaggtcttt 


caaacctttc 


660 


cacaaactga 


cgaacaagca 


ccttaggtgg 


tgttttacat 


aatatatcaa 


attgtggcat 


720 


gcttagcgcc 


gatcttgtgt 


gcaattgata 


tctagtttca 


actactctat 


ttatcttgta 


780 


tcttgcagta 


ttcaaacacg 


ctaactcgaa 


aaactaactt 


taattgtcct gtttgtctcg 


840 


cgttctttcg 


aaaaatgcac 


cggccgcgca 


ttatttgtac 


tgcgaaaata attggtactg 


900 


cggtatcttc 


atttcatatt 


ttaaaaatgc 


acctttgctg 


cttttcctta 


atttttagac 


960 


ggcccgcagg 


ttcgttttgc 


ggtactatct 


tgtgataaaa 


agttgttttg 


acatgtgatc 


1020 


tgcacagatt 


ttataatgta 


ataagcaaga 


atacattatc 


aaacgaacaa 


tactggtaaa 


1080 


agaaaaccaa 


aatggacgac 


attgaaacag 


ccaagaatct 


gacggtaaaa gcacgtacag 


1140 


cttatagcgt 


ctgggatgta 


tgtcggctgt 


ttattgaaat 


gattgctcct 


gatgtagata 


1200 


ttgatataga 


gagtaaacgt 


aagtctgatg 


agctactctt 


tccaggatat 


gtcataaggc 


1260 


ccatggaatc 


tctcacaacc 


ggtaggccgt 


atggtcttga 


ttctagcgca 


gaagattcca 


1320 


gcgtatcttc 


tgactccagt 


gctgaggtaa 


ttttgcctgc 


tgcgaagatg gttaaggaaa 


1380 
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ggtttgattc 


gattggaaat 


ggtatgctct 


cttcacaaga 


agcaagtcag 


gctgccatag 


1440 


atttgatgct 


acagaataac 


aagctgttag 


acaatagaaa 


gcaactatac 


aaatctattg 


1500 


ctataataat 


aggaagattg 


cccgagaaag 


acaagaagag 


agctaccgaa 


atgctcatga 


1560 


gaaaaatgga 


ttgtacacag 


ttattagtcc 


caccagctcc 


aacggaagaa 


gatgttatga 


1620 


agctcgtaag 


cgtcgttacc 


caattgctta 


ctttagttcc 


accagatcgt 


caagctgctt 


1680 


taataggtga 


tttattcatc 


ccggaatctc 


taaaggatat 


attcaatagt 


ttcaatgaac 


1740 


tggcggcaga 


gaatcgttta 


cagcaaaaaa 


agagtgagtt 


ggaaggaagg 


actgaagtga 


1800 


accatgctaa 


tacaaatgaa 


gaagttccct 


ccaggcgaac 


aagaagtaga 


gacacaaatg 


1860 


caagaggagc 


atataaatta 


caaaacacca 


tcactgaggg 


ccctaaagcg 


gttcccacga 


1920 


aaaaaaggag 


agtagcaacg 


agggtaaggg 


gcagaaaatc 


acgtaatact 


tctagggtat 


1980 


gatccaatat 


caaaggaaat 


gatagcattg 


aaggatgaga 


ctaatccaat 


tgaggagtgg 


2040 


cagcatatag 


aacagctaaa 


gggtagtgct 


gaaggaagca 


tacgataccc 


cgcatggaat 


2100 


gggataatat 


cacaggaggt 


actagactac 


ctttcatcct 


acataaatag 


acgcatataa 


2160 


gtacgcattt 


aagcataaac 


acgcactatg 


ccgttcttct 


catgtatata 


tatatacagg 


2220 


caacacgcag 


atataggtgc 


gacgtgaaca 


gtgagctgta 


tgtgcgcagc 


tcgcgttgca 


2280 


ttttcggaag 


cgctcgtttt 


cggaaacgct 


ttgaagttcc 


tattccgaag 


ttcctattct 


2340 


ctagaaagta 


taggaacttc 


agagcgcttt 


tgaaaaccaa 


aagcgctctg 


aagacgcact 


2400 


ttcaaaaaac 


caaaaacgca 


ccggactgta 


acgagctact 


aaaatattgc 


gaataccgct 


2460 


tccacaaaca 


ttgctcaaaa 


gtatctcttt 


gctatatatc 


tctgtgctat 


atccctatat 


2520 


aacctaccca 


tccacctttc 


gctccttgaa 


cttgcatcta 


aactcgacct 


ctacatcaac 


2580 


aggcttccaa 


tgctcttcaa 


attttactgt 


caagtagacc 


catacggctg 


taatatgctg 


2640 


ctcttcataa 


tgtaagctta 


tctttatcga 


atcgtgtgaa 


aaactactac 


cgcgataaac 


2700 


ctttacggtt 


ccctgagatt 


gaattagttc 


ctttagtata 


tgatacaaga 


cacttttgaa 


2760 


ctttgtacga 


cgaattttga 


ggttcgccat 


cctctggcta 


tttccaatta 


tcctgtcggc 


2820 


tattatctcc 


gcctcagttt 


gatcttccgc 


ttcagactgc 


catttttcac 


ataatgaatc 


2880 


tatttcaccc 


cacaatcctt 


catccgcctc 


cgcatcttgt 


tccgttaaac 


tattgacttc 


2940 


atgttgtaca 


ttgtttagtt 


cacgagaagg 


gtcctcttca 


ggcggtagct 


cctgatctcc 


3000 


tatatgacct 


ttatcctgtt 


ctctttccac 


aaacttagaa 


atgtattcat 


gaattatgga 


3060 


gcacctaata 


acattcttca 


aggcggagaa 


gtttgggcca 


gatgcccaat 


atgcttgaca 


3120 


tgaaaacgtg 


agaatgaatt 


tagtattatt 


gtgatattct 


gaggcaattt 


tattataatc 


3180 
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tcgaagataa 


gagaagaatg 


cagtgacctt 


tgtattgaca 


aatggagatt 


ccatgtatct 


3240 


aaaaaatacg 


cctttaggcc 


ttctgatacc 


ctttcccctg 


cggtttagcg 


tgccttttac 


3300 


attaatatct 


aaaccctctc 


cgatggtggc 


ctttaactga 


ctaataaatg 


caaccgatat 


3360 


aaactgtgat 


aattctgggt 


gatttatgat 


tcgatcgaca 


attgtattgt 


acactagtgc 


3420 


aggatcaggc 


caatccagtt 


ctttttcaat 


taccggtgtg 


tcgtctgtat 


tcagtacatg 


3480 


tccaacaaat 


gcaaatgcta 


acgttttgta 


tttcttataa 


ttgtcaggaa 


ctggaaaagt 


3540 


cccccttgtc 


gtctcgatta 


cacacctact 


ttcatcgtac 


accataggtt 


ggaagtgctg 


3600 


cataatacat 


tgcttaatac 


aagcaagcag 


tctctcgcca 


ttcatatttc 


agttattttc 


3660 


cattacagct 


gatgtcattg 


tatatcagcg 


ctgtaaaaat 


ctatctgtta 


cagaaggttt 


3720 


tcgcggtttt 


tataaacaaa 


actttcgtta 


cgaaatcgag 


caatcacccc 


agctgcgtat 


3780 


ttggaaattc 


gggaaaaagt 


agagcaacgc 


gagttgcatt 


ttttacacca 


taatgcatga 


3840 


ttaacttcga 


gaagggatta 


aggctaattt 


cactagtatg 


tttcaaaaac 


ctcaatctgt 


3900 


ccattgaatg 


ccttataaaa 


cagctataga 


ttgcatagaa 


gagttagcta 


ctcaatgctt 


3960 


tttgtcaaag 


cttactgatg 


atgatgtgtc 


tactttcagg 


cgggtctgta 


gtaaggagaa 


4020 


tgacattata 


aagctggcac 


ttagaattcc 


acggactata 


gactatacta 


gtatactccg 


4060 


tctactgtac 


gatacacttc 


cgctcaggtc 


cttgtccttt 


aacgaggcct 


taccactctt 


4140 


ttgttactct 


attgatccag 


ctcagcaaag 


gcagtgtgat 


ctaagattct 


atcttcgcga 


4200 


tgtagtaaaa 


ctagctagac 


cgagaaagag 


actagaaatg 


caaaaggcac 


ttctacaatg 


4260 


gctgccatca 


ttattatccg 


atgtgacgct 


gcattttttt 


tttttttttt 


tttttttttt 


4320 


tttttttttt 


tttttttttt 


ttttttggta 


caaatatcat 


aaaaaaagag 


aatcttttta 


4380 


agcaaggatt 


ttcttaactt 


cttcggcgac 


agcatcaccg 


acttcggtgg 


tactgttgga 


4440 


accacctaaa 


tcaccagttc 


tgatacctgc 


atccaaaacc 


tttttaactg 


catcttcaat 


4500 


ggctttacct 


tcttcaggca 


agttcaatga 


caatttcaac 


atcattgcag 


cagacaagat 


4560 


agtggcgata 


gggttgacct 


tattctttgg 


caaatctgga 


gcggaaccat 


ggcatggttc 


4620 


gtacaaacca 


aatgcggtgt 


tcttgtctgg 


caaagaggcc 


aaggacgcag 


atggcaacaa 


4680 


acccaaggag 


cctgggataa 


cggaggcttc 


atcggagatg 


atatcaccaa 


acatgttgct 


4740 


ggtgattata 


ataccattta 


ggtgggttgg 


gttcttaact 


aggatcatgg 


cggcagaatc 


4800 


aatcaattga 


tgttgaactt 


tcaatgtagg 


gaattcgttc 


ttgatggttt 


cctccacagt 


4860 


ttttctccat 


aatcttgaag 


aggccaaaac 


attagcttta 


tccaaggacc 


aaataggcaa 


4920 


tggtggctca 


tgttgtaggg 


ccatgaaagc 


ggccattctt 


gtgattcttt 


gcacttctgg 


4980 
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aacggtgtat tgttcactat cccaagcgac 
aaagtaaata cctcccacta attctctaac 
tggcttgatt ggagataagt ctaaaagaga 
ggcgtacaat tgaagttctt tacggatttt 
ggtaccccat ttaggaccac ccacagcacc 
ttccagcgcc tcatctggaa gtggaacacc 
atgattttcg aaatcgaact tgacattgga 
aatggcttcg gctgtgattt cttgaccaac 
aggggcagac attacaatgg tatatccttg 
aaaaaaaaaa atgcagcttc tcaatgatat 
tatccgacaa actgttttac agatttacga 
acatccgaac ctgggagttt tccctgaaac 
tatagtctag cgctttacgg aagacaatgt 
atctattgca taggtaatct tgcacgtcgc 
tgcacttcaa tagcatatct ttgttaacga 
atgcaacgcg agagcgctaa tttttcaaac 
gaaatgcaac gcgaaagcgc tattttacca 
caaaaatgca acgcgagagc gctaattttt 
gaacagaaat gcaacgcgag agcgctattt 
ttctacaaaa atgcatcccg agagcgctat 
tttctccttt gtgcgctcta taatgcagtc 
taaggttaga agaaggctac tttggtgtct 
cacttcccgc gtttactgat tactagcgaa 
atccccgatt atattctata ccgatgtgga 
gcgttgatga ttcttcattg gtcagaaaat 
atactacgta taggaaatgt ttacattttc 
tcttactaca atttttttgt ctaaagagta 
gtcgagttta gatgcaagtt caaggagcga 
agcacagaga tatatagcaa agagatactt 
aatattttag tagctcgtta cagtccggtg 



PCT/US00/32326 

accatcacca tcgtcttcct ttctcttacc 5040 
aacaacgaag tcagtacctt tagcaaattg 5100 
gtcggatgca aagttacatg gtcttaagtt 5160 
tagtaaacct tgttcaggtc taacactacc 5220 
taacaaaacg gcatcagcct tcttggaggc 5280 
tgtagcatcg atagcagcac caccaattaa 5340 
acgaacatca gaaatagctt taagaacctt 5400 
gtggtcacct ggcaaaacga cgatcttctt 5460 
aaatatatat aaaaaaaaaa aaaaaaaaaa 5520 
tcgaatacgc tttgaggaga tacagcctaa 5580 
tcgtacttgt tacccatcat tgaattttga 5640 
agatagtata tttgaacctg tataataata 5700 
atgtatttcg gttcctggag aaactattgc 5760 
atccccggtt cattttctgc gtttccatct 5820 
agcatctgtg cttcattttg tagaacaaaa 5880 
aaagaatctg agctgcattt ttacagaaca 5940 
acgaagaatc tgtgcttcat ttttgtaaaa 6000 
caaacaaaga atctgagctg catttttaca 6060 
taccaacaaa gaatctatac ttcttttttg 6120 
ttttctaaca aagcatctta gattactttt 6180 
tcttgataac tttttgcact gtaggtccgt 6240 
attttctctt ccataaaaaa agcctgactc 6300 
gctgcgggtg cattttttca agataaaggc 6360 
ttgcgcatac tttgtgaaca gaaagtgata 6420 
tatgaacggt ttcttctatt ttgtctctat 64 80 
gtattgtttt cgattcactc tatgaatagt 6540 
atactagaga taaacataaa aaatgtagag 6600 
aaggtggatg ggtaggttat atagggatat 6660 
ttgagcaatg tttgtggaag cggtattcgc 6720 
cgtttttggt tttttgaaag tgcgtcttca 6780 
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gagcgctttt 


ggttttcaaa 


agcgctctga 


agttcctata 


ctttctagag 


aataggaact 


6840 


tcggaatagg 


aacttcaaag 


cgtttccgaa 


aacgagcgct 


tccgaaaatg 


caacgcgagc 


6900 


tgcgcacata 


cagctcactg 


ttcacgtcgc 


acctatatct 


gcgtgttgcc 


tgtatatata 


6960 


tatacatgag 


aagaacggca 


tagtgcgtgt 


ttatgcttaa 


atgcgtactt 


atatgcgtct 


7020 


atttatgtag 


gatgaaaggt 


agtctagtac 


ctcctgtgat 


attatcccat 


tccatgcggg 


7080 


gtatcgtatg 


cttccttcag 


cactaccctt 


tagctgttct 


atatgctgcc 


actcctcaat 


7140 


tggattagtc 


tcatccttca 


atgctatcat 


ttcctttgat 


attggatcat 


atgcatagta 


7200 


ccgagaaact 


agtgcgaagt 


agtgatcagg 


tattgctgtt 


atctgatgag 


tatacgttgt 


7260 


cctggccacg 


gcagaagcac 


gcttatcgct 


ccaatttccc 


acaacattag 


tcaactccgt 


7320 


taggcccttc 


attgaaagaa 


atgaggtcat 


caaatgtctt 


ccaatgtgag 


attttgggcc 


7380 


attttttata 


gcaaagattg 


aataaggcgc 


atttttcttc 


aaagctttat 


tgtacgatct 


7440 


gactaagtta 


tcttttaata 


attggtattc 


ctgtttattg 


cttgaagaat 


tgccggtcct 


7500 


atttactcgt 


tttaggactg 


gttcagaatt 


cctcaaaaat 


tcatccaaat 


atacaagtgg 


7560 


atcgatgata 


agctgtcaaa 


catgagaatt 


cttgaagacg 


aaagggcctc 


gtgatacgcc 


7620 


tatttttata 


ggttaatgtc 


atgataataa 


tggtttctta 


gacgtcaggt 


ggcacttttc 


7680 


ggggaaatgt 


gcgcggaacc 


cctatttgtt 


tatttttcta 


aatacattca 


aatatgtatc 


7740 


cgctcatgag 


acaataaccc 


tgataaatgc 


ttcaataata 


ttgaaaaagg 


aagagtatga 


7800 


gtattcaaca 


tttccgtgtc 


gcccttattc 


ccttttttgc 


ggcattttgc 


cttcctgttt 


7860 


ttgctcaccc 


agaaacgctg 


gtgaaagtaa 


aagatgctga 


agatcagttg 


ggtgcacgag 


7920 


tgggttacat 


cgaactggat 


ctcaacagcg 


gtaagatcct 


tgagagtttt 


cgccccgaag 


7980 


aacgttttcc 


aatgatgagc 


acttttaaag 


ttctgctatg 


tggcgcggta 


ttatcccgtg 


8040 


ttgacgccgg 


gcaagagcaa 


ctcggtcgcc 


gcatacacta 


ttctcagaat 


gacttggttg 


8100 


agtactcacc 


agtcacagaa 


aagcatctta 


cggatggcat 


gacagtaaga 


gaattatgca 


8160 


gtgctgccat 


aaccatgagt 


gataacactg 


cggccaactt 


acttctgaca 


acgatcggag 


8220 


gaccgaagga 


gctaaccgct 


tttttgcaca 


acatggggga 


tcatgtaact 


cgccttgatc 


8280 


gttgggaacc 


ggagctgaat 


gaagccatac 


caaacgacga 


gcgtgacacc 


acgatgcctg 


8340 


cagcaatggc 


aacaacgttg 


cgcaaactat 


taactggcga 


actacttact 


ctagcttccc 


8400 


ggcaacaatt 


aatagactgg 


atggaggcgg 


ataaagttgc 


aggaccactt 


ctgcgctcgg 


8460 


cccttccggc 


tggctggttt 


attgctgata 


aatctggagc 


cggtgagcgt 


gggtctcgcg 


8520 


gtatcattgc 


agcactgggg 


ccagatggta 


agccctcccg 


tatcgtagtt 


atctacacga 


8580 
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cggggagtca 


ggcaactatg 


gatgaacgaa 


atagacagat 


cgctgagata 


ggtgcctcac 


8640 


tgattaagca 


ttggtaactg 


tcagaccaag 


tttactcata 


tatactttag 


attgatttaa 


8700 


aacttcattt 


ttaatttaaa 


aggatctagg 


tgaagatcct 


ttttgataat 


ctcatgacca 


8760 


aaatccctta 


acgtgagttt 


tcgttccact 


gagcgtcaga 


ccccgtagaa 


aagatcaaag 


8820 


gatcttcttg 


agatcctttt 


tttctgcgcg 


taatctgctg 


cttgcaaaca 


aaaaaaccac 


8880 


cgctaccagc 


ggtggtttgt 


ttgccggatc 


aagagctacc 


aactcttttt 


ccgaaggtaa 


8940 


ctggcttcag 


cagagcgcag 


ataccaaata 


ctgtccttct 


agtgtagccg 


tagttaggcc 


9000 


accacttcaa 


gaactctgta 


gcaccgccta 


catacctcgc 


tctgctaatc 


ctgttaccag 


9060 


tggctgctgc 


cagtggcgat 


aagtcgtgtc 


ttaccgggtt 


ggactcaaga 


cgatagttac 


9120 


cggataaggc 


gcagcggtcg 


ggctgaacgg 


ggggttcgtg 


cacacagccc 


agcttggagc 


9180 


gaacgaccta 


caccgaactg 


agatacctac 


agcgtgagct 


atgagaaagc 


gccacgcttc 


9240 


ccgaagggag 


aaaggcggac 


aggtatccgg 


taagcggcag 


ggtcggaaca 


ggagagcgca 


9300 


cgagggagct 


tccaggggga 


aacgcctggt 


atctttatag 


tcctgtcggg 


tttcgccacc 


9360 


tctgacttga 


gcgtcgattt 


ttgtgatgct 


cgtcaggggg 


gcggagccta 


tggaaaaacg 


9420 


ccagcaacgc 


ggccttttta 


cggttcctgg 


ccttttgctg 


gccttttgct 


cacatgttct 


9480 


ttcctgcgtt 


atcccctgat 


tctgtggata 


accgtattac 


cgcctttgag 


tgagctgata 


9540 


ccgctcgccg 


cagccgaacg 


accgagcgca 


gcgagtcagt 


gagcgaggaa 


gcggaagagc 


9600 


gcctgatgcg 


gtattttctc 


cttacgcatc 


tgtgcggtat 


ttcacaccgc 


atatggtgca 


9660 


ctctcagtac 


aatctgctct 


gatgccgcat 


agttaagcca 


gtatacactc 


cgctatcgct 


9720 


acgtgactgg 


gtcatggctg 


cgccccgaca 


cccgccaaca 


cccgctgacg 


cgccctgacg 


9780 


ggcttgtctg 


ctcccggcat 


ccgcttacag 


acaagctgtg 


accgtctccg 


ggagctgcat 


9840 


gtgtcagagg 


ttttcaccgt 


catcaccgaa 


acgcgcgagg 


cagctgcggt 


aaagctcatc 


9900 


agcgtggtcg 


tgaagcgatt 


cacagatgtc 


tgcctgttca 


tccgcgtcca 


gctcgttgag 


9960 


tttctccaga 


agcgttaatg 


tctggcttct 


gataaagcgg 


gccatgttaa 


gggcggtttt 


10020 


ttcctgtttg 


gtcactgatg 


cctccgtgta 


agggggattt 


ctgttcatgg 


gggtaatgat 


10060 


accgatgaaa 


cgagagagga 


tgctcacgat 


acgggttact 


gatgatgaac 


atgcccggtt 


10140 


actggaacgt 


tgtgagggta 


aacaactggc 


ggtatggatg 


cggcgggacc 


agagaaaaat 


10200 


cactcagggt 


caatgccagc 


gcttcgttaa 


tacagatgta 


ggtgttccac 


agggtagcca 


10260 


gcagcatcct 


gcgatgcaga 


tccggaacat 


aatggtgcag 


ggcgctgact 


tccgcgtttc 


10320 


cagactttac 


gaaacacgga 


aaccgaagac 


cattcatgtt 


gttgctcagg 


tcgcagacgt 


10380 
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tttgcagcag 


cagtcgcttc 


acgttcgctc 


gcgtatcggt 


gattcattct 


gctaaccagt 


10440 


aaggcaaccc 


cgccagccta 


gccgggtcct 


caacgacagg 


agcacgatca 


tgcgcacccg 


10500 


tggccaggac 


ccaacgctgc 


ccgagatgcg 


ccgcgtgcgg 


ctgctggaga 


tggcggacgc 


10560 


gatggatatg 


ttctgccaag 


ggttggtttg 


cgcattcaca 


gttctccgca 


agaattgatt 


10620 


ggctccaatt 


cttggagtgg 


tgaatccgtt 


agcgaggtgc 


cgccggcttc 


cattcaggtc 


10680 


gaggtggccc 


ggctccatgc 


accgcgacgc 


aacgcgggga 


ggcagacaag 


gtatagggcg 


10740 


gcgcctacaa 


tccatgccaa 


cccgttccat 


gtgctcgccg 


aggcggcata 


aatcgccgtg 


10800 


acgatcagcg 


gtccaatgat 


cgaagttagg 


ctggtaagag 


ccgcgagcga 


tccttgaagc 


10860 


tgtccctgat 


ggtcgtcatc 


tacctgcctg 


gacagcatgg 


cctgcaacgc 


gggcatcccg 


10920 


atgccgccgg 


aagcgagaag 


aatcataatg 


gggaaggcca 


tccagcctcg 


cgtcgcgaac 


10980 


gccagcaaga 


cgtagcccag 


cgcgtcggcc 


gccatgccgg 


cgataatggc 


ctgcttctcg 


11040 


ccgaaacgtt 


tggtggcggg 


accagtgacg 


aaggcttgag 


cgagggcgtg 


caagattccg 


11100 


aataccgcaa 


gcgacaggcc 


gatcatcgtc 


gcgctccagc 


gaaagcggtc 


ctcgccgaaa 


11160 


atgacccaga 


gcgctgccgg 


cacctgtcct 


acgagttgca 


tgataaagaa 


gacagtcata 


11220 


agtgcggcga 


cgatagtcat 


gccccgcgcc 


caccggaagg 


agctgactgg 


gttgaaggct 


11280 


ctcaagggca 


tcggtcgagg 


atccttcaat 


atgcgcacat 


acgctgttat 


gttcaaggtc 


11340 


ccttcgttta 


agaacgaaag 


cggtcttcct 


tttgagggat 


gtttcaagtt 


gttcaaatct 


11400 


atcaaatttg 


caaatcccca 


gtctgtatct 


agagcgttga 


atcggtgatg 


cgatttgtta 


11460 


attaaattga 


tggtgtcacc 


attaccaggt 


ctagatatac 


caatggcaaa 


ctgagcacaa 


11520 


caataccagt 


ccggatcaac 


tggcaccatc 


tctcccgtag 


tctcatctaa 


tttttcttcc 


11580 


ggatgaggtt 


ccagatatac 


cgcaacacct 


ttattatggt 


ttccctgagg 


gaataataga 


11640 


atgtcccatt 


cgaaatcacc 


aattctaaac 


ctgggcgaat 


tgtatttcgg 


gtttgttaac 


11700 


tcgttccagt 


caggaatgtt 


ccacgtgaag 


ctatcttcca 


gcaaagtctc 


cacttcttca 


11760 


tcaaattgtg 


gagaatactc 


ccaatgctct 


tatctatggg 


acttccggga 


aacacagtac 


11820 


cgatacttcc 


caattcgtct 


tcagagctca 


ttgtttgttt 


gaagagacta 


atcaaagaat 


11880 


cgttttctca 


aaaaaattaa 


tatcttaact 


gatagtttga 


tcaaaggggc 


aaaacgtagg 


11940 


ggcaaacaaa 


cggaaaaatc 


gtttctcaaa 


ttttctgatg 


ccaagaactc 


taaccagtct 


12000 


tatctaaaaa 


ttgccttatg 


atccgtctct 


ccggttacag 


cctgtgtaac 


tgattaatcc 


12060 


tgcctttcta 


atcaccattc 


taatgtttta 


attaagggat 


tttgtcttca 


ttaacggctt 


12120 


tcgctcataa 


aaatgttatg 


acgttttgcc 


cgcaggcggg 


aaaccatcca 


cttcacgaga 


12180 
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ctgatctcct ctgccggaac accgggcatc tccaacttat aagttggaga aataagagaa 12240 

tttcagattg agagaatgaa aaaaaaaaac ccttagttca taggtccatt ctcttagcgc 12300 

aactacagag aacaggggca caaacaggca aaaaacgggc acaacctcaa tggagtgatg 12360 

caacctgcct ggagtaaatg atgacacaag gcaattgacc cacgcatgta tctatctcat 12420 

tttcttacac cttctattac cttctgctct ctctgatttg gaaaaagctg aaaaaaaagg 12480 

ttgaaaccag ttccctgaaa ttattcccct acttgactaa taagtatata aagacggtag 12540 

gtattgattg taattctgta aatctatttc ttaaacttct taaattctac ttttatagtt 12600 

agtctttttt ttagttttaa aacaccaaga acttagtttc gaataaacac acataaacaa 12660 

acaagcttac aaaacaaa atg get gca tat gca get cag ggc tat aag gtg 12711 

Met Ala Ala Tyr Ala Ala, Gin Gly Tyr Lys Val 
1 5 10 

eta gta etc aac ccc tct gtt get gca aca ctg ggc ttt ggt get tac 12759 
Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr 
15 20 25 

atg tec aag get cat ggg ate gat cct aac ate agg ace ggg gtg aga 12807 
Met Ser Lys Ala His Gly He Asp Pro Asn He Arg Thr Gly Val Arg 
30 35 40 

aca att ace act ggc age ccc ate acg tac tec ace tac ggc aag ttc 12855 
Thr He Thr Thr Gly Ser Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe 
45 50 55 

ctt gee gac ggc ggg tgc teg ggg ggc get tat gac ata ata att tgt 12903 
Leu Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp He He He Cys 
60 65 70 75 

gac gag tgc cac tec acg gat gee aca tec ate ttg ggc att ggc act 12951 
Asp Glu Cys His Ser Thr Asp Ala Thr Ser He Leu Gly He Gly Thr 
80 85 90 

gtc ctt gac caa gca gag act gcg ggg gcg aga ctg gtt gtg etc gee 12999 
Val Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala 
95 100 ~ 105 

ace gee ace cct ccg ggc tec gtc act gtg ccc cat ccc aac ate gag 13047 
Thr Ala Thr Pro Pro Gly Ser Val Thr Val Pro His Pro Asn He Glu 
110 115 120 

gag gtt get ctg tec acc acc gga gag ate cct ttt tac ggc aag get 13095 
Glu Val Ala Leu Ser Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala 
125 130 135 

ate ccc etc gaa gta ate aag ggg ggg aga cat etc ate ttc tgt cat 13143 
He Pro Leu Glu Val He Lys Gly Gly Arg His Leu He Phe Cys His 
140 145 150 155 

tea aag aag aag tgc gac gaa etc gee gca aag ctg gtc gca ttg ggc 13191 
Ser Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly 
160 165 170 
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ate aat gec gtg gec tac tac cgc ggt ctt gac gtg tec gtc ate ccg 13239 
lie Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro 
175 180 185 

acc age ggc gat gtt gtc gtc gtg gca ace gat gee etc atg ace ggc 13287 
Thr Ser Gly Asp Val Val Val Val Ala Thr Asp Ala Leu Met Thr Gly 
190 195 200 

tat acc ggc gac ttc gac teg gtg ata gac tgc aat acg tgt gtc acc 13335 
Tyr Thr Gly Asp Phe Asp Ser Val He Asp Cys Asn Thr Cys Val Thr 
205 210 215 

cag aca gtc gat ttc age ctt gac cct acc ttc acc att gag aca ate 13383 
Gin Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr He Glu Thr He 
220 225 230 235 

acg etc ccc caa gat get gtc tec cgc act caa cgt egg ggc agg act 13431 
Thr Leu Pro Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr 
240 245 250 

ggc agg ggg aag cca ggc ate tac aga ttt gtg gca ccg ggg gag cgc 13479 
Gly Arg Gly Lys Pro Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg 
255 260 265 

ccc tec ggc atg ttc gac teg tec gtc etc tgt gag tgc tat gac gca 13527 
Pro Ser Gly Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala 
270 275 280 

ggc tgt get tgg tat gag etc acg ccc gee gag act aca gtt agg eta 13575 
Gly Cys Ala Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu 
285 290 295 

cga gcg tac atg aac acc ccg ggg ctt ccc gtg tgc cag gac cat ctt 1362 3 
Arg Ala Tyr Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu 
300 305 310 315 

gaa ttt tgg gag ggc gtc ttt aca ggc etc act cat ata gat gee cac 13671 
Glu Phe Trp Glu Gly Val Phe Thr Gly Leu Thr His He Asp Ala His 
320 325 330 

ttt eta tec cag aca aag cag agt ggg gag aac ctt cct tac ctg gta 13719 
Phe Leu Ser Gin Thr Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val 
335 340 345 

gcg tac caa gee acc gtg tgc get agg get caa gee cct ccc cca teg 13767 
Ala Tyr Gin Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser 
350 355 360 

tgg gac cag atg tgg aag tgt ttg att cgc etc aag ccc acc etc cat 13815 
Trp Asp Gin Met Trp Lys Cys Leu He Arg Leu Lys Pro Thr Leu His 
365 370 375 

ggg cca aca ccc ctg eta tac aga ctg ggc get gtt cag aat gaa ate 13863 
Gly Pro Thr Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu He 
380 385 390 395 

acc ctg acg cac cca gtc acc aaa tac ate atg aca tgc atg teg gee 13911 
Thr Leu Thr His Pro Val Thr Lys Tyr He Met Thr Cys Met Ser Ala 
400 405 410 
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gac ctg gag gtc gtc acg age acc tgg gtg etc gtt ggc ggc gtc ctg 13959 
Asp Leu Glu Val Val Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu 
415 420 425 

get get ttg gee gcg tat tgc ctg tea aca ggc tgc gtg gtc ata gtg 14 007 
Ala Ala Leu Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val He Val . 
430 435 440 

ggc agg gtc gtc ttg tec ggg aag ccg gca ate ata cct gac agg gaa 14055 
Gly Arg Val Val Leu Ser Gly Lys Pro Ala He He Pro Asp Arg Glu 
445 450 455 

gtc etc tac cga gag ttc gat gag atg gaa gag tgc tct cag cac tta 14103 
Val Leu Tyr Arg Glu Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu 
460 465 470 475 

ccg tac ate gag caa ggg atg atg etc gec gag cag ttc aag cag aag 14151 
Pro Tyr He Glu Gin Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys 
480 485 490 

gee etc ggc etc ctg cag acc gcg tec cgt cag gca gag gtt ate gee 14199 
Ala Leu Gly Leu Leu Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala 
495 500 505 

cct get gtc cag acc aac tgg caa aaa etc gag acc ttc tgg gcg aag 14247 
Pro Ala Val Gin Thr Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys 
510 515 520 

cat atg tgg aac ttc ate agt ggg ata caa tac ttg gcg ggc ttg tea 14295 
His Met Trp Asn Phe lie Ser Gly lie Gin Tyr Leu Ala Gly Leu Ser 
525 530 535 

acg ctg cct ggt aac ccc gee att get tea ttg atg get ttt aca get 14343 
Thr Leu Pro Gly Asn Pro Ala lie Ala Ser Leu Met Ala Phe Thr Ala 
540 545 550 555 

get gtc acc age cca eta acc act age caa acc etc etc ttc aac ata 14391 
Ala Val Thr Ser Pro Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He 
560 565 570 

ttg ggg ggg tgg gtg get gee cag etc gee gee ccc ggt gee get act 14439 
Leu Gly Gly Trp Val Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr 
575 580 585 

gec ttt gtg ggc get ggc tta get ggc gec gee ate ggc agt gtt gga 14487 
Ala Phe Val Gly Ala Gly Leu Ala Gly Ala Ala lie Gly Ser Val Gly 
590 595 600 

ctg ggg aag gtc etc ata gac ate ctt gca ggg tat ggc gcg ggc gtg 14535 
Leu Gly Lys Val Leu He Asp He Leu Ala Gly Tyr Gly Ala Gly Val 
605 610 615 

gcg gga get ctt gtg gca ttc aag ate atg age ggt gag gtc ccc tec 14583 
Ala Gly Ala Leu Val Ala Phe Lys lie Met Ser Gly Glu Val Pro Ser 
620 625 630 635 

acg gag gac ctg gtc aat eta ctg ccc gee ate etc teg ccc gga gec 14631 
Thr Glu Asp Leu Val Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala 
640 645 650 
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etc gta gtc ggc gtg gtc tgt gca gca ata ctg cgc egg cac gtt ggc 14679 
Leu Val Val Gly Val Val Cys Ala Ala He Leu Arg Arg His Val Gly 
655 660 665 

ccg ggc gag ggg gca gtg cag tgg atg aac egg ctg ata gee ttc gee 14727 
Pro Gly Glu Gly Ala Val Gin Trp Met Asn Arg Leu He Ala Phe Ala 
670 675 680 

tec egg ggg aac cat gtt tec ccc acg cac tac gtg ccg gag age gat 14775 
Ser Arg Gly Asn His Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp 
685 690 695 

gca get gee cgc gtc act gec ata etc age age etc act gta ace cag 14823 
Ala Ala Ala Arg Val Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin 
700 705 710 715 

etc ctg agg cga ctg cac cag tgg ata age teg gag tgt ace act cca 14 871 
Leu Leu Arg Arg Leu His Gin Trp He Ser Ser Glu Cys Thr Thr Pro 
720 725 730 

tgc tec ggt tec tgg eta agg gac ate tgg gac tgg ata tgc gag gtg 14919 
Cys Ser Gly Ser Trp Leu Arg Asp lie Trp Asp Trp He Cys Glu Val 
735 740 745 

ttg age gac ttt aag ace tgg eta aaa get aag etc atg cca cag ctg 14 967 
Leu Ser Asp Phe Lys Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu 
750 755 760 

cc t ggg ate ccc ttt gtg tec tgc cag cgc ggg tat aag ggg gtc tgg 15015 
Pro Gly He Pro Phe Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp 
765 770 775 

cga ggg gac ggc ate atg cac act cgc tgc cac tgt gga get gag ate 15063 
Arg Gly Asp Gly He Met His Thr Arg Cys His Cys Gly Ala Glu He 
780 785 790 795 

act gga cat gtc aaa aac ggg acg atg agg ate gtc ggt cct agg ace 15111 
Thr Gly His Val Lys Asn Gly Thr Met Arg He Val Gly Pro Arg Thr 
800 805 810 

tgc agg aac atg tgg agt ggg ace ttc ccc att aat gee tac ace acg 15159 
Cys Arg Asn Met Trp Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr 
815 820 825 

ggc ccc tgt ace ccc ctt cct gcg ccg aac tac acg ttc gcg eta tgg 15207 
Gly Pro Cys Thr Pro Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp 
830 835 840 

agg gtg tct gca gag gaa tac gtg gag ata agg cag gtg ggg gac ttc 15255 
Arg Val Ser Ala Glu Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe 
845 850 855 

cac tac gtg acg ggt atg act act gac aat ctt aaa tgc ccg tgc cag 15303 
His Tyr Val Thr Gly Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin 
860 865 870 875 

gtc cca teg ccc gaa ttt ttc aca gaa ttg gac ggg gtg cgc eta cat 15351 
Val Pro Ser Pro Glu Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His 
880 885 890 
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agg ttt gcg ccc ccc tgc aag ccc ttg ctg egg gag gag gta tea ttc 
Arg Phe Ala Pro Pro Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe 
895 900 905 



15399 



aga gta gga etc cac gaa tac ccg gta ggg teg caa tta cct tgc gag 
Arg Val Gly Leu His Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu 
910 915 920 



15447 



ccc gaa ccg gac gtg gec gtg ttg acg tec atg etc act gat ccc tec 
Pro Glu Pro Asp Val Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser 
925 930 935 



15495 



cat ata aca gca gag gcg gee ggg cga agg ttg gcg agg gga tea ccc 
His lie Thr Ala Glu Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro 
940 945 950 955 



15543 



ccc tct gtg gee age tec teg get age cag eta tec get cca tct etc 
Pro Ser Val Ala Ser Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu 
960 965 970 



15591 



aag gca act tgc acc get aac cat gac tec cct gat get gag etc ata 
Lys Ala Thr Cys Thr Ala Asn His Asp Ser Pro Asp Ala Glu Leu lie 
975 980 985 



15639 



gag gee aac etc eta tgg agg cag gag atg ggc ggc aac ate acc agg 
Glu Ala Asn Leu Leu Trp Arg Gin Glu Met Gly Gly Asn lie Thr Arg 
990 995 1000 



15687 



gtt gag tea gaa aac aaa gtg gtg att ctg gac tec ttc gat ccg ctt 
Val Glu Ser Glu Asn Lys Val Val lie Leu Asp Ser Phe Asp Pro Leu 
1005 1010 1015 



15735 



gtg gcg gag gag gac gag egg gag ate tec gta ccc gca gaa ate ctg 
Val Ala Glu Glu Asp Glu Arg Glu lie Ser Val Pro Ala Glu lie Leu 
1020 1025 1030 1035 



15783 



egg aag tct egg aga ttc gee cag gec ctg ccc gtt tgg gcg egg ccg 
Arg Lys Ser Arg Arg Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro 
1040 1045 1050 



15831 



gac tat aac ccc ccg eta gtg gag acg tgg aaa aag ccc gac tac gaa 
Asp Tyr Asn Pro Pro Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu 
1055 1060 1065 



15879 



cca cct gtg gtc cat ggc tgc ccg ctt cca cct cca aag tec cct cct 
Pro Pro Val Val His Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro 
1070 1075 1080 



15927 



gtg cct ccg cct egg aag aag egg acg gtg gtc etc act gaa tea acc 
Val Pro Pro Pro Arg Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr 
1085 1090 1095 



15975 



eta tct act gee ttg gee gag etc gee acc aga age ttt ggc age tec 
Leu Ser Thr Ala Leu Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser 
1100 1105 1110 1115 



16023 



tea act tec ggc att acg ggc gac aat acg aca aca tec tct gag ccc 
Ser Thr Ser Gly lie Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro 
1120 1125 1130 
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gcc cct tct ggc tgc ccc ccc gac tec gac get gag tec tat tec tec 
Ala Pro Ser Gly Cys Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser 
1135 1140 1145 



16119 



atg ccc ccc ctg gag ggg gag cct ggg gat ccg gat ctt age gac ggg 
Met Pro Pro Leu Glu Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly 
1150 1155 1160 



16167 



tea tgg tea acg gtc agt agt gag gcc aac gcg gag gat gtc gtg tgc 
Ser Trp Ser Thr Val Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys 
1165 1170 1175 



16215 



tgc tea atg tct tac tct tgg aca ggc gca etc gtc ace ccg tgc gcc 
Cys Ser Met Ser Tyr Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala 
1180 1185 1190 1195 



16263 



gcg gaa gaa cag aaa ctg ccc ate aat gca eta age aac teg ttg eta 
Ala Glu Glu Gin Lys Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu 
1200 1205 1210 



16311 



cgt cac cac aat ttg gtg tat tec acc acc tea cgc agt get tgc caa 
Arg His His Asn Leu Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin 
1215 1220 1225 



16359 



agg cag aag aaa gtc aca ttt gac aga ctg caa gtt ctg gac age cat 
Arg Gin Lys Lys Val Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His 
1230 1235 1240 



16407 



tac cag gac gta etc aag gag gtt aaa gca gcg gcg tea aaa gtg aag 
Tyr Gin Asp Val Leu Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys 
1245 1250 1255 



16455 



get aac ttg eta tec gta gag gaa get tgc age ctg acg ccc cca cac 
Ala Asn Leu Leu Ser Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His 
1260 1265 1270 1275 



16503 



tea gcc aaa tec aag ttt ggt tat ggg gca aaa gac gtc cgt tgc cat 
Ser Ala Lys Ser Lys Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His 
1280 1285 1290 



16551 



gcc aga aag gcc gta acc cac ate aac tec gtg tgg aaa gac ctt ctg 
Ala Arg Lys Ala Val Thr His lie Asn Ser Val Trp Lys Asp Leu Leu 
1295 1300 1305 



16599 



gaa gac aat gta aca cca ata gac act acc ate atg get aag aac gag 
Glu Asp Asn Val Thr Pro lie Asp Thr Thr lie Met Ala Lys Asn Glu 
1310 1315 1320 



16647 



gtt ttc tgc gtt cag cct gag aag ggg ggt cgt aag cca get cgt etc 
Val Phe Cys Val Gin Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu 
1325 1330 1335 



16695 



ate gtg ttc ccc gat ctg ggc gtg cgc gtg tgc gaa aag atg get ttg 
lie Val Phe Pro Asp Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu 
1340 1345 1350 1355 



16743 



tac gac gtg gtt aca aag etc ccc ttg gcc gtg atg gga age tec tac 
Tyr Asp Val Val Thr Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr 
1360 1365 1370 



16791 
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gga ttc caa tac tea cca gga cag egg gtt gaa ttc etc gtg caa gcg 16639 
Gly Phe Gin Tyr Ser Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala 
1375 1380 1385 

tgg aag tec aag aaa acc cca atg ggg ttc teg tat gat acc cgc tgc 16887 
Trp Lys Ser Lys Lys Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys 
1390 1395 1400 

ttt gac tec aca gtc act gag age gac ate cgt acg gag gag gca ate 16935 
Phe Asp Ser Thr Val Thr Glu Ser Asp lie Arg Thr Glu Glu Ala He 
1405 1410 1415 

tac caa tgt tgt gac etc gac ccc caa gee cgc gtg gee ate aag tec 16983 
Tyr Gin Cys Cys Asp Leu Asp Pro Gin Ala Arg Val Ala He Lys Ser 
1420 1425 1430 1435 

etc acc gag agg ctt tat gtt ggg ggc cct ctt acc aat tea agg ggg 17031 
Leu Thr Glu Arg Leu Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly 
1440 1445 1450 

gag aac tgc ggc tat cgc agg tgc cgc gcg age ggc gta ctg aca act 17079 
Glu Asn Cys Gly Tyr Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr 
1455 1460 1465 

age tgt ggt aac acc etc act tgc tac ate aag gee egg gca gec tgt 17127 
Ser Cys Gly Asn Thr Leu Thr Cys Tyr He Lys Ala Arg Ala Ala Cys 
1470 1475 1480 

cga gee gca ggg etc cag gac tgc acc atg etc gtg tgt ggc gac gac 17175 
Arg Ala Ala Gly Leu Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp 
1485 1490 1495 

tta gtc gtt ate tgt gaa age gcg ggg gtc cag gag gac gcg gcg age 17223 
Leu Val Val He Cys Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser 
1500 1505 1510 1515 

ct 9 a 9 a 9 CC ttc ac 9 9 a 9 9 ct at 9 acc a 99 tac tcc 9 CC ccc cct 999 17271 
Leu Arg Ala Phe Thr Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly 
1520 1525 1530 

gac ccc cca caa cca gaa tac gac ttg gag etc ata aca tea tgc tec 17319 
Asp Pro Pro Gin Pro Glu Tyr Asp Leu Glu Leu He Thr Ser Cys Ser 
1535 1540 1545 

tec aac gtg tea gtc gec cac gac ggc get gga aag agg gtc tac tac 17367 
Ser Asn Val Ser Val Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr 
1550 1555 1560 

etc acc cgt gac cct aca acc ccc etc gcg aga get gcg tgg gag aca 17415 
Leu Thr Arg Asp Pro Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr 
1565 1570 1575 

gca aga cac act cca gtc aat tec tgg eta ggc aac ata ate atg ttt 17463 
Ala Arg His Thr Pro Val Asn Ser Trp Leu Gly Asn lie lie Met Phe 
1580 1585 1590 1595 

gec ccc aca ctg tgg gcg agg atg ata ctg atg acc cat ttc ttt age 17511 
Ala Pro Thr Leu Trp Ala Arg Met He Leu Met Thr His Phe Phe Ser 
1600 1605 1610 
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gtc ctt ata gcc agg gac cag ctt gaa cag gcc etc gat tgc gag ate 
Val Leu He Ala Arg Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu He 
1615 1620 1625 



17559 



tac ggg gcc tgc tac tec ata gaa cca ctg gat eta cct cca ate att 
Tyr Gly Ala Cys Tyr Ser He Glu Pro Leu Asp Leu Pro Pro He He 
1630 1635 1640 



17607 



caa aga etc cat ggc etc age gca ttt tea etc cac agt tac tct cca 
Gin Arg Leu His Gly Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro 
1645 1650 1655 



17655 



ggt gaa ate aat agg gtg gcc gca tgc etc aga aaa ctt ggg gta ccg 
Gly Glu He Asn Arg Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro 
1660 1665 1670 1675 



17703 



ccc ttg cga get tgg aga cac egg gcc egg age gtc cgc get agg ctt 
Pro Leu Arg Ala Trp Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu 
1680 1685 1690 



17751 



ctg gcc aga gga ggc agg get gcc ata tgt ggc aag tac etc ttc aac 
Leu Ala Arg Gly Gly Arg Ala Ala He Cys Gly Lys Tyr Leu Phe Asn 
1695 1700 1705 



17799 



tgg gca gta aga aca aag etc aaa etc act cca ata gcg gcc get ggc 
Trp Ala Val Arg Thr Lys Leu Lys Leu Thr Pro He Ala Ala Ala Gly 
1710 1715 1720 



17847 



cag ctg gac ttg tec ggc tgg ttc acg get ggc tac age ggg gga gac 
Gin Leu Asp Leu Ser Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp 
1725 1730 1735 



17895 



att tat cac age gtg tct cat gcc egg ccc cgc tgg ate tgg ttt tgc 
He Tyr His Ser Val Ser His Ala Arg Pro Arg Trp He Trp Phe Cys 
1740 1745 1750 1755 



17943 



eta etc ctg ctt get gca ggg gta ggc ate tac etc etc ccc aac cga 
Leu Leu Leu Leu Ala Ala Gly Val Gly He Tyr Leu Leu Pro Asn Arg 
1760 1765 1770 



17991 



atg age acg aat cct aaa cct caa aga aag ace aaa cgt aac ace aac 
Met Ser Thr- Asn Pro Lys Pro Gin Arg Lys Thr Lys Arg Asn Thr Asn 
1775 1780 1785 



18039 



egg egg ccg cag gac gtc aag ttc ccg ggt ggc ggt cag ate gtt ggt 
Arg Arg Pro Gin Asp Val Lys Phe Pro Gly Gly Gly Gin He Val Gly 
1790 1795 1800 



18087 



gga gtt tac ttg ttg ccg cgc agg ggc cct aga ttg ggt gtg cgc gcg 
Gly Val Tyr Leu Leu Pro Arg Arg Gly Pro Arg Leu Gly Val Arg Ala 
1805 1810 1815 



18135 



acg aga aag act tec gag egg teg caa cct cga ggt aga cgt cag cct 
Thr Arg Lys Thr Ser Glu Arg Ser Gin Pro Arg Gly Arg Arg Gin Pro 
1820 1825 1830 1835 



18183 



ate ccc aag get cgt egg ccc gag ggc agg ace tgg get cag ccc ggg 
He Pro Lys Ala Arg Arg Pro Glu Gly Arg Thr Trp Ala Gin Pro Gly 
1840 1845 1850 



18231 
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tac cct tgg ccc etc tat ggc aat gag ggc tgc ggg tgg gcg gga tgg 18279 
Tyr Pro Trp Pro Leu Tyr Gly Asn Glu Gly Cys Gly Trp Ala Gly Trp 
1855 1860 1865 

etc ctg tct ccc cgt ggc tct egg cct age tgg ggc ccc aca gac ccc 18327 
Leu Leu Ser Pro Arg Gly Ser Arg Pro Ser Trp Gly Pro Thr Asp Pro 
1870 1875 1880 

egg cgt agg teg cgc aat ttg ggt aag gtc ate gat ace ctt acg tgc 18375 
Arg Arg Arg Ser Arg Asn Leu Gly Lys Val lie Asp Thr Leu Thr Cys 
1885 1890 1895 

ggc ttc gee gac etc atg ggg tac ata ccg etc gtc ggc gec cct ctt 18423 
Gly Phe Ala Asp Leu Met Gly Tyr lie Pro Leu Val Gly Ala Pro Leu 
1900 1905 1910 1915 

gga ggc get gee agg gee taatagtcga ctttgttccc actgtacttt 18471 
Gly Gly Ala Ala Arg Ala 
1920 



tagctegtae 


aaaatacaat 


atacttttca 


tttctccgta 


aacaacatgt 


tttcccatgt 


18531 


aatatccttt 


tctatttttc 


gttccgttac 


caactttaca 


catactttat 


atagctattc 


18591 


acttctatac 


actaaaaaac 


taagacaatt 


ttaattttgc 


tgcctgccat 


atttcaattt 


18651 


gttataaatt 


cctataattt 


atcctattag 


tagctaaaaa 


aagatgaatg 


tgaatcgaat 


18711 


cctaagagaa 


ttggatctga 


tccacaggac 


gggtgtggtc 


gecatgateg 


cgtagtcgat 


18771 


agtggctcca 


agtagcgaag 


cgagcaggac 


tgggeggegg 


ecaaageggt 


cggacagtgc 


18831 


tccgagaacg 


ggtgcgcata 


gaaattgeat 


caaegcatat 


agegctagea 


gcacgccata 


18891 


gtgactggcg 


atgctgtcgg 


aatggacgat 


atcccgcaag 


aggcccggca 


gtaceggcat 


18951 


aaccaagcct 


atgcctacag 


catccagggt 


gacggtgccg 


aggatgacga 


tgagegcatt 


19011 


gttagatttc 


atacaeggtg 


cctgactgcg 


ttagcaattt 


aactgtgata 


aactaccgca 


19071 


ttaaagcttt 


ttctttccaa 


tttttttttt 


ttegtcatta 


taaaaatcat 


tacgaccgag 


19131 


attccegggt 


aataactgat 


ataattaaat 


tgaagctcta 


atttgtgagt 


ttagtataca 


19191 


tgcatttact 


tataatacag 


ttttttagtt 


ttgctggccg 


catcttctca 


aatatgette 


19251 


ccagcctgct 


tttctgtaac 


gttcaccctc 


taccttagca 


tcccttccct 


ttgeaaatag 


19311 


tcctcttcca 


acaataataa 


tgtcagatcc 


tgtagagacc 


acatcatcca 


eggttctata 


19371 


ctgttgaccc 


aatgegtetc 


ccttgtcatc 


taaacccaca 


ccgggtgtca 


taatcaacca 


19431 


ategtaaect 


tcatctcttc 


cacccatgtc 


tctttgagca 


ataaagecga 


taacaaaatc 


19491 


tttgtcgctc 


ttcgcaatgt 


caacagtacc 


cttagtatat 


tctccagtag 


atagggagee 


19551 


ettgeatgae 


aattctgeta 


acatcaaaag 


gectctaggt 


tcctttgtta 


cttcttctgc 


19611 


cgcctgcttc 


aaaccgctaa 


caatacctgg 


gcccaccaca 


ccgtgtgcat 


tcgtaatgtc 


19671 
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tgcccattct 


gctattctgt 


atacacccgc 


agagtactgc 


aatttgactg 


tattaccaat 


19731 


gtcagcaaat 


tttctgtctt 


cgaagagtaa 


aaaattgtac 


ttggcggata 


atgcctttag 


19791 


cggcttaact 


gtgccctcca 


tggaaaaatc 


agtcaagata tccacatgtg tttttagtaa 19851 


acaaattttg 


ggacctaatg 


cttcaactaa 


ctccagtaat 


tccttggtgg tacgaacatc 


19911 


caatgaagca 


cacaagtttg 


tttgcttttc 


gtgcatgata ttaaatagct 


tggcagcaac 


19971 


aggactagga 


tgagtagcag 


cacgttcctt 


atatgtagct 


ttcgacatga 


tttatcttcg 20031 


tttcctgcag 


gtttttgttc 


tgtgcagttg 


ggttaagaat 


actgggcaat 


ttcatgtttc 


20091 


ttcaacacta 


catatgcgta 


tatataccaa 


tctaagtctg 


tgctccttcc 


ttcgttcttc 


20151 


cttctgttcg 


gagattaccg 


aatcaaaaaa 


atttcaagga 


aaccgaaatc 


aaaaaaaaga 


20211 


ataaaaaaaa 


aatgatgaat 


tgaaaagctt 


atcgat 






20247 



<210> 19 
<211> 1921 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd . delta . NS3NS5 . p j . corelSO 

<400> 19 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu Asn Pro 
15 10 15 

Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys Ala His 
20 25 30 

Gly lie Asp Pro Asn He Arg Thr Gly Val Arg Thr He Thr Thr Gly 
35 40 45 

Ser Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp Gly Gly 
50 *, 55 60 

Cys Ser Gly Gly Ala Tyr Asp He He He Cys Asp Glu Cys His Ser 
65 70 75 80 

Thr Asp Ala Thr Ser He Leu Gly He Gly Thr Val Leu Asp Gin Ala 
85 90 95 

Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr Pro Pro 
100 105 110 

Gly Ser Val Thr Val Pro His Pro Asn lie Glu Glu Val Ala Leu Ser 
115 120 125 

Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala He Pro Leu Glu Val 
130 135 140 

He Lys Gly Gly Arg His Leu He Phe Cys His Ser Lys Lys Lys Cys 
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145 



150 



155 



160 



Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly lie Asn Ala Val Ala 

165 170 175 

Tyr Tyr Arg Gly Leu Asp Val Ser Val lie Pro Thr Ser Gly Asp Val 

180 185 190 

Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe 

195 200 205 

Asp Ser Val lie Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe 
210 215 220 

Ser Leu Asp Pro Thr Phe Thr lie Glu Thr lie Thr Leu Pro Gin Asp 

225 230 235 240 

Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly Lys Pro 

245 250 255 

Gly lie Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly Met Phe 

260 265 270 

Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala Trp Tyr 



Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr Met Asn 
290 295 300 

Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp Glu Gly 
305 310 315 320 

Val Phe Thr Gly Leu Thr His He Asp Ala His Phe Leu Ser Gin Thr 
325 330 335 

Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin Ala Thr 
340 345 350 

Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin Met Trp 
355 360 365 

Lys Cys Leu He Arg Leu Lys Pro Thr Leu His Gly Pro Thr Pro Leu 
370 375 380 

Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu He Thr Leu Thr His Pro 
385 390 395 400 

val Thr Lys Tyr He Met Thr Cys Met Ser Ala Asp Leu Glu Val Val 
405 410 415 

Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu Ala Ala 
420 425 430 

Tyr Cys Leu Ser Thr Gly Cys Val Val He Val Gly Arg Val Val Leu 
435 440 445 

Ser Gly Lys Pro Ala He He Pro Asp Arg Glu Val Leu Tyr Arg Glu 
450 455 460 



275 



280 



285 
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Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr lie Glu Gin 
465 470 475 480 

Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly Leu Leu 
485 490 495 

Gin Thr Ala Ser Arg Gin Ala Glu Val lie Ala Pro Ala Val Gin Thr 
500 505 510 

Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp Asn Phe 
515 520 525 

He Ser Gly He Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro Gly Asn 
530 535 540 

Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr Ser Pro 
545 550 555 560 

Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He Leu Gly Gly Trp Val 
565 570 575 

Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val Gly Ala 
580 585 590 

Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly Leu Gly Lys Val Leu 
595 600 * 605 

He Asp He Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala Leu Val 
610 615 620 

Ala Phe Lys He Met Ser Gly Glu Val Pro Ser Thr Glu Asp Leu Val 
625 630 635 640 

Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala Leu Val Val Gly Val 
645 650 655 

Val Cys Ala Ala He Leu Arg Arg His Val Gly Pro Gly Glu Gly Ala 
660 665 670 

Val Gin Trp Met Asn Arg Leu He Ala Phe Ala Ser Arg Gly Asn His 
675 680 685 

Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala Arg Val 
690 695 700 

Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg Arg Leu 
705 710 715 720 

His Gin Trp He Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly Ser Trp 
725 730 735 

Leu Arg Asp He Trp Asp Trp He Cys Glu Val Leu Ser Asp Phe Lys 
740 745 750 

Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly He Pro Phe 
755 760 765 

Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp Gly He 
770 775 780 
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Met His Thr Arg Cys 
785 

Asn Gly Thr Met Arg 
805 

Ser Gly Thr Phe Pro 
820 



His Cys Gly Ala Glu He 
790 795 

He Val Gly Pro Arg Thr 
810 

He Asn Ala Tyr Thr Thr 
825 



Thr Gly His Val Lys 
800 

Cys Arg Asn Met Trp 
815 

Gly Pro Cys Thr Pro 
830 



Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser Ala Glu 

835 840 845 

Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe His Tyr Val Thr Gly 
850 855 860 

Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser Pro Glu 
865 870 875 880 



Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala Pro Pro 
885 890 895 

Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly Leu His 
900 905 910 

Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro Asp Val 
915 920 925 

Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His He Thr Ala Glu 
930 935 940 

Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val Ala Ser 
945 950 955 960 

Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr Cys Thr 
965 970 975 

Ala Asn His Asp Ser Pro Asp Ala Glu Leu He Glu Ala Asn Leu Leu 
980 985 990 

Trp Arg Gin Glu Met Gly Gly Asn He Thr Arg Val Glu Ser Glu Asn 
995 1000 1005 

Lys Val Val He Leu Asp Ser Phe Asp Pro Leu Val Ala Glu Glu Asp 
1010 1015 1020 

Glu Arg Glu He Ser Val Pro Ala Glu He Leu Arg Lys Ser Arg Arg 
025 1030 1035 1040 

Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn Pro Pro 
1045 1050 1055 

Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val Val His 
1060 1065 1070 

Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro Pro Arg 
1075 1080 1085 

Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr Ala Leu 
1090 1095 1100 
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Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser Gly lie 
105 1110 1115 1120 

Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser Gly Cys 
1125 1130 1135 

Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro Leu Glu 
1140 1145 1150 

Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser Thr Val 
1155 re 1160 1165 

Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met Ser Tyr 
1170 1175 1180 

Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu Gin Lys 
185 1190 1195 1200 

Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu Arg His His Asn Leu 
1205 1210 1215 

Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys Lys Val 
1220 1225 ~ 1230 

Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp Val Leu 
1235 1240 1245 

Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu Leu Ser 
1250 1255 1260 

Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys Ser Lys 
265 1270 1275 1280 

Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys Ala Val 
1285 1290 1295 

Thr His lie Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn Val Thr 
1300 1305 1310 

Pro He Asp Thr Thr He Met Ala Lys Asn Glu Val Phe Cys Val Gin 
1315 1320 1325 

Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu He Val Phe Pro Asp 
1330 1335 1340 

Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val Val Thr 
345 1350 1355 1360 

Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin Tyr Ser 
1365 1370 1375 

Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser Lys Lys 
1380 1385 1390 

Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val 
1395 1400 1405 

Thr Glu Ser Asp He Arg Thr Glu Glu Ala He Tyr Gin Cys Cys Asp 
1410 1415 1420 
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Leu Asp Pro Gin Ala Arg Val Ala lie Lys Ser Leu Thr Glu Arg Leu 
425 1430 1435 1440 

Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr 
1445 1450 1455 

Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
1460 1465 1470 

Leu Thr Cys Tyr lie Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
1475 1480 1485 

Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val lie Cys 
1490 1495 1500 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe Thr 
505 1510 1515 1520 

Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro Gin Pro 
1525 1530 1535 

Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser Ser Asn Val Ser Val 
1540 1545 1550 

Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg Asp Pro 
1555 * 1560 1565 

Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His Thr Pro 
1570 1575 1580 

Val Asn Ser Trp Leu Gly Asn lie lie Met Phe Ala Pro Thr Leu Trp 
585 1590 1595 1600 

Ala Arg Met lie Leu Met Thr His Phe Phe Ser Val Leu lie Ala Arg 
1605 1610 1615 

Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu lie Tyr Gly Ala Cys Tyr 
1620 1625 1630 



Ser lie Glu Pro Leu Asp Leu Pro 
1635 1640 

Leu Ser Ala Phe Ser Leu His Ser 
1650 1655 

Val Ala Ala Cys Leu Arg Lys Leu 
665 1670 

Arg His Arg Ala Arg Ser Val Arg 
1685 



Pro lie lie Gin Arg Leu His Gly 
1645 

Tyr Ser Pro Gly Glu lie Asn Arg 
1660 

Gly Val Pro Pro Leu Arg Ala Trp 
1675 1680 

Ala Arg Leu Leu Ala Arg Gly Gly 
1690 1695 



Arg Ala Ala lie Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val Arg Thr 
1700 1705 1710 

Lys Leu Lys Leu Thr Pro lie Ala Ala Ala Gly Gin Leu Asp Leu Ser 
1715 1720 1725 

Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp lie Tyr His Ser Val 
1730 1735 1740 
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Ser His Ala Arg Pro Arg Trp lie Trp Phe Cys Leu Leu Leu Leu Ala 

745 1750 1755 1760 

Ala Gly Val Gly lie Tyr Leu Leu Pro Asn Arg Met Ser Thr Asn Pro 
1765 1770 1775 

Lys Pro Gin Arg Lys Thr Lys Arg Asn Thr Asn Arg Arg Pro Gin Asp 

1780 1785 1790 

Val Lys Phe Pro Gly Gly Gly Gin He Val Gly Gly Val Tyr Leu Leu 

1795 1800 1805 

Pro Arg Arg Gly Pro Arg Leu Gly Val Arg Ala Thr Arg Lys Thr Ser 

1810 1815 1820 

Glu Arg Ser Gin Pro Arg Gly Arg Arg Gin Pro He Pro Lys Ala Arg 

825 1830 1835 1840 

Arg Pro Glu Gly Arg Thr Trp Ala Gin Pro Gly Tyr Pro Trp Pro Leu 
1845 1850 1855 

Tyr Gly Asn Glu Gly Cys Gly Trp Ala Gly Trp Leu Leu Ser Pro Arg 

1860 1865 1870 

Gly Ser Arg Pro Ser Trp Gly Pro Thr Asp Pro Arg Arg Arg Ser Arg 

1875 1880 1885 

Asn Leu Gly Lys Val He Asp Thr Leu Thr Cys Gly Phe Ala Asp Leu 

1890 1895 1900 

Met Gly Tyr He Pro Leu Val Gly Ala Pro Leu Gly Gly Ala Ala Arg 

905 1910 1915 1920 

Ala 
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